part two

Part 2. Deeper learning (neural networks)

Part 1 gathered the tools for natural language processing and dove into machine learning with statistics-driven vector space models. You discovered that even more meaning could be found when you looked at the statistics of connections between words.^[1] You learned about algorithms such as latent semantic analysis that can help make sense of those connections by gathering words into topics.

¹ Conditional probability is one term for these connection statistics (how often a word occurs given that other words occur before or after the “target” word). Cross correlation is another one of these statistics (the likelihood of words occurring together). The singular values and singular vectors of the word--document matrix can be used to collect words into topics, linear combinations of word counts.

But part 1 considered only linear relationships between words. And you often had to use human judgment to design feature extractors and select model parameters. The neural networks of part 2 accomplish most of the tedious feature extraction work for you. And the models of part 2 are often more accurate than those you could build with the hand-tuned feature extractors of part 1.