Discussion: View Thread

  • 1.  How do writers differ quantitatively?

    Posted 03-28-2020 16:01
    How do FICTION writers differ quantitatively?  I have the following list.  What can be added?

    # characters per word (first used by DeMorgan), # words per sentence
    % unique words
    use of frequent worda
    use of sensory adjectives
    use of sentiment words
    use of positive or negative words
    verb/adjective ratio
    compexity (grade level readability)

    Does anyone have new items?  Does anyone have some quantitatively-oriented English teachers​​ who can weigh in?

    Jerry

    ------------------------------
    Jerry Tuttle
    Adjunct online math instructor

    ------------------------------


  • 2.  RE: How do writers differ quantitatively?

    Posted 03-30-2020 14:14
    Hi-

    I keep the following list of parsing for what you are trying to do into these categories.  Much of it from Google NLP but also from other sources as well.  Hope this helps add to your list:

    Text Components:  (Google calls this "Text Span") - represents an output piece of the overall text that has a central entity.

    Token (either word or term) identifiers, with the following identified attributes that define the token

    - Parts of Speech: Adjective, Preposition,  Postposition, Adverb, Conjunction, Determiner, Common Noun,  Proper Non, Cardinal number, Pronoun, Particle or other function word, Punctuation, Verb,  Verb tense, Verb mode, Foreign word/term,  Typo, Abbreviation,  Emoticon, and Affix

    - Time flow during an event: Perfective, Imperfective, and Progressive as well as Tense: conditional, future, past, present, imperfect, pluperfect
    Noun or pronoun case: accusative, adverbial, complement, dative, genitive, instrumental, locative, nominative, oblique, partitive, prepositional, reflexive, reflexive_case, relative, relative_case, and vocative

    - Grammatical mood: Conditional, Imperative, Indicative, Interrogative, Jussive, Subjunctive

    - Person: first, second, third, reflexive

    - Proper noun: yes/no flag

    - Voice: active, causitive, passive

    Text, Text Component, Sentence, and Token calculations:
    - lexical diversity = number of unique words divided by count of words in overall text
    - average and median of number of character/words within the specific text component
    - ratios of token identifiers in terms of entire text component

    Sentence: If text has sentence structure, the sentences and the number of sentences in the overall text.  

    Token: Word or term identifiers:
    - Entity Analysis (think proper nouns):  Named Person, Location, Organization, Event, Artwork, Consumer Good, Brand, 
    - Entities around Location details: phone, address (and all of its components), geo long-lat, other geographic markers
    - Entities around specific Amounts:  Date, Number, Currency

    Saliency: A score for an entity provides information about the importance or centrality of a specific entity to the entire text. 

    Sentiment Score per Text Component

    Text Classification: The name of the category and a score of confidence that the portion of text meets the classification criteria

    Original Language: English, French, German, Spanish, Chinese - Mandarin, etc.

    ------------------------------
    Carol Haney
    Senior Research and Data Scientist, Distinguished
    Qualtrics
    ------------------------------