Chapter 8. Advanced NLP example: movie review sentiment

This chapter covers

Using a real-world dataset for predicting sentiment from movie reviews
Exploring possible use cases for this data and the appropriate modeling strategy
Building an initial model using basic NLP features and optimizing the parameters
Improving the accuracy of the model by extracting more-advanced NLP features
Scaling and other deployment aspects of using this model in production

In this chapter, you’ll use some of the advanced feature-engineering knowledge acquired in the previous chapter to solve a real-world problem. Specifically, you’ll use advanced text and NLP feature-engineering processes to build and optimize a model based on user-submitted reviews of movies.

As always, you’ll start by investigating and analyzing the dataset at hand to understand the feature and target columns so you can make the best decisions about which feature-extraction and ML algorithms to use. You’ll then build the initial model from the simplest feature-extraction algorithms to see how you can quickly get a useful model with only a few lines of code. Next, you’ll dig a little deeper into the library of feature-extraction and ML modeling algorithms to improve the accuracy of the model even further. You’ll conclude by exploring various deployment and scalability aspects of putting the model into production.

8.1. Exploring the data and use case

8.2. Extracting basic NLP features and building the initial model

8.3. Advanced algorithms and model deployment considerations

8.4. Summary

8.5. Terms from this chapter

Word	Definition
word2vec	An NLP modeling framework, initially released by Google and used in many state-of-the-art machine-learning systems involving natural language
hyperparameter optimization	Various techniques for choosing parameters that control ML algorithms’ execution to maximize their performance