This section covers:
- What is Natural Language Processing?
- Comparing texts based on world overlap
- Comparing texts using 1-dimensional arrays called vectors
- Comparing texts using 2-dimensional arrays called matrices
- Efficient matrix compututation using NumPy
Rapid text analysis is able to save lives. Consider this actual real-world incident, when US soldiers stormed a terrorist compound. In the compound, they discovered a computer containing terabytes of archived data. The data included documents, text-messages, and e-mails pertaining to terrorist activities. The documents were too numerous to be read by any single human being. Fortunately, the soldiers were equipped with special software for very fast analysis of text. The software allowed the soldiers process all text-data without even having to leave the compound. The onsite analysis immediately revealed an active terrorist plot in a nearby neighborhood. The soldiers instantaneously responded to the plot, and prevented a terrorist attack.
This swift defensive response would not have been possible without NLP techniques. NLP stands for Natural Language Processing; a branch of data science that focuses on speedy text analysis. Typically, NLP is applied to very large text datasets. NLP use cases are numerous and diverse. They include: