chapter ten

10 Topic Modeling

 

This chapter covers

  • Introduction to topic modelling with Latent Dirichlet Allocation (LDA)
  • Overview of gensim, an NLP toolkit for topic modelling
  • Implementation of an unsupervised topic modelling approach using gensim
  • Introduction of several visualization techniques for topic exploration in data

The previous chapter introduced various NLP and machine learning techniques for topic classification and topic analysis. Here is a reminder of the scenario that you’ve worked on: suppose you work as a content manager for a large news platform. Your platform hosts texts from a wide variety of authors and mainly specializes in the following set of well-established topics: “Politics”, “Finance”, “Science”, “Sports”, and “Arts”. Your task is to decide, for every incoming article, which topic it belongs to and post it under the relevant tab on the platform. Here are some questions for you to consider:

10.1 Topic Modelling with Latent Dirichlet Allocation

10.1.1 Estimating Parameters for the LDA

10.1.2 LDA as a Generative Model

10.2 Implementation of the Topic Modelling Algorithm

10.2.1 Loading the data

10.2.2 Preprocessing the data

10.2.3 Applying the LDA model

10.2.4 Exploring the results

10.3 Summary

10.4 Solutions to exercises