Discussion: View Thread

  • 1.  Word Concordance

    Posted 03-18-2020 11:05
    Dear All,

    My apologies for what is likely a very basic question, but I am doing my first text analysis and am looking for some guidance.

    I am working with a colleague in our romance language department.  She is analyzing speeches of a politician over time to see how they have changed.  She has identified a two key words and we are trying to assess the concordance of the words in proximity to each other, and how that has shifted over time.  We are comparing the likelihood of the key word #1 occurring within a certain # of words when key word #2 is observed vs. the likelihood of key word #1 occurring within a certain # of words when key word #2 is not observed.  Is this a common method of analyzing concordance of words.  If so, can someone share a citation with me.  Also, is there a recommended method for choosing the "within a certain # of words" limit?  And is there some obvious method of analyzing this that we are omitting?

    I appreciate any feedback you have on this.


    Sincerely,

    Michael

    ​​​​​​​

    ------------------------------
    Michael Posner
    Associate Professor of Statistics
    Director, Center for Statistics Education
    Villanova University
    ------------------------------


  • 2.  RE: Word Concordance

    Posted 03-18-2020 11:34
    Hi Michael-

    The technique you are referencing below is a common technique in Natural Language Processing (see wikipedia page - Co-occurrence network)
    Wikipedia remove preview
    Co-occurrence network
    Co-occurrence networks are generally used to provide a graphic visualization of potential relationships between people, organizations, concepts, biological organisms like bacteria or other entities represented within written material. The generation and visualization of co-occurrence networks has become practical with the advent of electronically stored text compliant to text mining.
    View this on Wikipedia >


    A few years ago, researchers at the University of Nebraska Lincoln did a similar approach with Jane Austen's entire corpus (except for her letters), looking at words that often are used together and where they do not, and the distances between those words, I *think* using a TFIDF matrix.  I will dig up the paper and the authors and send.

    Best,

    Carol


    https://en.wikipedia.org/wiki/Co-occurrence_network)

    ------------------------------
    Carol Haney
    Senior Research and Data Scientist, Distinguished
    Qualtrics
    ------------------------------