This chapter covers
- Using Amazon Comprehend for named entity recognition (NER)
- Understanding Comprehend’s modes of operations (asynchronous, batch, and synchronous)
- Using asynchronous Comprehend services
- Triggering Lambda functions using S3 notifications
- Handling errors in Lambdas using a dead-letter queue
- Processing results from Comprehend
Chapter 8 dealt with the challenge of gathering unstructured data from websites for use in machine learning analysis. This chapter builds on the serverless web crawler from chapter 8. This time, we are concerned with using machine learning to extract meaningful insights from the data we gathered. If you didn’t work through chapter 8, you should go back and do so now before proceeding with this chapter, as we will be building directly on top of the web crawler. If you are already comfortable with that content, we can dive right in and add the information extraction parts.
Let’s remind ourselves of the grand vision for our chapter 8 scenario--finding relevant developer conferences to attend. We want to facilitate a system that allows people to search for conferences and speakers of interest to them. In chapter 8’s web crawler, we built a system that solved the first part of this scenario--gathering data on conferences.