7 Unsupervised learning for text data
“Everybody smiles in the same language – George Carlin”
Our world has so many languages. These languages are the most common medium of communication to express our thoughts and emotions to each other. This ability to express our thoughts in words is unique to humans. These words are a source of information to us. These words can be written into text. In this chapter, we are going to explore the analysis we can do on text data. Text data falls under unstructured data and carries a lot of useful information and hence is a useful source of insights for the business. We use natural language processing or NLP to analyse the text data.
At the same time, to analyse text data, we have to make the data analysis ready. Or in very simple terms, since our algorithms and processors can only understand numbers, we have to represent the text data in numbers or vectors. We are exploring all such steps in this chapter. Text data holds the key to quite a few important use cases like sentiment analysis, document categorization, language translation etc. to name a few. We will cover the use cases using a case study and develop Python solution on the same.