This chapter covers
In chapter 12, you learned how to use all the tools in your NLP toolbox to build an NLP pipeline capable of carrying on a conversation. We demonstrated crude examples of this chatbot dialog capability on small datasets. The humanness, or IQ, of your dialog system seems to be limited by the data you train it with. Most of the NLP approaches you’ve learned give better and better results, if you can scale them up to handle larger datasets.
You may have noticed that your computer bogs down, even crashes, if you run some of the examples we gave you on large datasets. Some datasets in nlpia.data.loaders.get_data()
will exceed the memory (RAM) in most PCs or laptops.
Besides RAM, another bottleneck in your natural language processing pipelines is the processor. Even if you had unlimited RAM, larger corpora would take days to process with some of the more complex algorithms you’ve learned.
So you need to come up with algorithms that minimize the resources they require:
- Volatile storage (RAM)
- Processing (CPU cycles)