10 Part of speech tagging and word sense disambiguation

 

This chapter covers

  • Disambiguating language by predicting nouns, verbs, adjectives from past data
  • Making decisions and explaining them using Hidden Markov Models
  • Using TensorFlow to model explainable problems and to collect evidence
  • How to compute Hidden Markov Model initial, transition, and emission probabilities from existing data
  • Creating a part-of-speech tagger from your own data and larger corpora

You use language every day to communicate with others and if you are like me, sometimes you scratch your head, especially if you are using the English language. English is known to have a ton of exceptions that make it difficult to teach non-native speakers, along with your little ones who are growing up trying to learn it themselves. Context matters, and conversationally you can leverage other tools such as hand motions, facial expressions, long pauses, and other visual cues to convey additional context or meaning, but when you are reading language as written text, much of that context is missing and there is a lot of ambiguity. Parts of speech (PoS) can help to fill that missing context in disambiguating words and making sense of them in text. PoS tells you whether the word is being used as an action (verb), whether it refers to an object (noun), or whether it describes a noun (adjective), and so on.

10.1       Quick Review of HMM example: Rainy or Sunny and what it’s actually doing

10.2       Part-of-speech (PoS) tagging

10.2.1   The big picture: training and predicting PoS with HMMs

10.2.2   Generating the ambiguity PoS tagged dataset

10.3       Algorithms for building the Hidden Markov Model (HMM) for PoS disamguiation

10.3.1   Generating the emission probabilities

10.4       Running the HMM and evaluating its output

10.5       Getting more training data using the Brown corpus

10.6       Defining error bars and metrics for PoS tagging

10.7       Summary