Appendix D. Natural language processing
We’ve used NLP throughout the book. NLP refers to a set of techniques and methods for processing written and spoken (usually human) languages. In practical terms, it helps us deal with text and audio records for the purpose of analyzing their content. As you can imagine, the field is as vast as it is interesting.
Work on NLP dates back to the early years of AI. In fact, the famous Turing test was cast in terms of a computer’s ability to communicate with a human over a cable line, without the human being able to distinguish whether or not the entity on the other side of the cable is human; for a nice review of the Turing test, see Saygin et al. In a field that old, you can find several branches that tackle the same problem from different angles. Thus, terms such as computational linguistics and speech synthesis refer to research areas that address the same (or closely related) kind of problems as NLP.
An excellent reference on NLP is Speech and Language Processing by Daniel Jurafsky and James Martin. The authors break down the engineering of natural language into the following components: