This chapter covers
In the big data and enterprise context, relational databases are often the source of the data on which you will perform analytics. It makes sense to understand how to extract data from those databases, both through the whole table or through SQL SELECT
statements.
In this chapter, you’ll learn several ways to ingest data from those relational databases, ingesting either the full table at once or asking the database to perform some operations before the ingestion. Those operations could be filtering, joining, or aggregating data at the database level to minimize data transfer.
You will see in this chapter which databases are supported by Spark. When you work with a database not supported by Spark, a custom dialect is required. The dialect is a way to inform Spark of how to communicate with the database. Spark comes with a few dialects and, in most cases, you won’t need to even think about them. However, for those special situations, you’ll learn how to build one.
Figure 8.1 This chapter focuses on ingestion from databases, whether the database is supported by Spark, or is not supported and requires a custom dialect to be used.
