This chapter covers
- Introducing a use case for machine learning
- Starting with object storage for serverless machine learning
- Using crawlers to automatically discover structured data schemas
- Migrating to column-oriented data storage for more efficient analytics
- Experimenting with PySpark extract-transform-load (ETL) jobs
In the previous chapter, you learned about serverless machine learning platforms and some of the reasons they can help you build a successful machine learning system. In this chapter, you will get started with a pragmatic, real-world use case for a serverless machine learning platform. Next, you are asked to download a data set of a few years’ worth of taxi rides from Washington, DC, to build a machine learning model for the use case. As you get familiar with the data set and learn about the steps for using it to build a machine learning model, you are introduced to the key technologies that are a part of a serverless machine learning platform, including object storage, data crawlers, metadata catalogs, and distributed data processing (extract-transform-load) services. By the conclusion of the chapter, you will also see examples with code and shell commands that illustrate how these technologies can be used with Amazon Web Services (AWS) so that you can apply what you learned in your own AWS account.