Hi, all.
At our annual meeting, we had a brief discussion about our name. ASA's Council of Sections suggested that "text analysis" might be too narrow for a section. There is also the concern that "text analysis" might fit neatly under another section. FWIW, it didn't sound to me like either of these was a catastrophic blocker to us becoming a section. Even so, I want to pick up the discussion of naming here and put forward my suggestion for a section name.
"Corpus Statistics" or
"The Section for Corpus Statistics"I think "Corpus Statistics" checks two boxes for us:
- Its name implies broad linguistic phenomena as a subfield of statistical study (i.e., it's less narrow than just "text")
- Its name distinguishes it from both Natural Language Processing (NLP) and Linguistics as a uniquely statistical approach
Natural Language Processing is a subfield of Artificial Intelligence and, as such, is inherently task-focused (extract entities, produce summaries, etc.). But it does not primarily focus on making inference on populations from samples, inherently
the task of statistics. Meanwhile, there is a subfield of Linguistics called
Corpus Linguistics that studies real-world samples of language to understand linguistic phenomena. While Corpus Linguistics is focused on language itself, Corpus Statistics studies samples of language as statistical phenomena. Here, language as statistical phenomena includes more than just text (though in practice, step 1 of speech analytics is
usually converting speech to text).
In addition to proposing the above name change, I suggest we use this thread for any discussion of re-naming, whether it's for "Corpus Statistics" or for another proposal. That way, we don't have to search through many threads to get at the broader discussion.
Very interested to hear what you all think!
------------------------------
Tommy Jones
------------------------------