10 Topic modeling

 

This chapter covers

  • Introducing topic modeling with latent Dirichlet allocation
  • Exploring gensim, an NLP toolkit for topic modeling
  • Implementing an unsupervised topic modeling approach using gensim
  • Introducing several visualization techniques for topic exploration in data

The previous chapter introduced various NLP and machine-learning techniques for topic classification and topic analysis. Here is a reminder of the scenario that you’ve worked on: suppose you work as a content manager for a large news platform. Your platform hosts texts from a wide variety of authors and mainly specializes in the following set of well-established topics: Politics, Finance, Science, Sports, and Arts. Your task is to decide, for every incoming article, which topic it belongs to and post it under the relevant tab on the platform. Here are some questions for you to consider:

10.1 Topic modeling with latent Dirichlet allocation

10.1.1 Exercise 10.1: Question 1 solution

10.1.2 Exercise 10.1: Question 2 solution

10.1.3 Estimating parameters for the LDA

10.1.4 LDA as a generative model

10.2 Implementation of the topic modeling algorithm

10.2.1 Loading the data

10.2.2 Preprocessing the data

10.2.3 Applying the LDA model

10.2.4 Exploring the results

Summary

Solutions to miscellaneous exercises