This chapter covers
As you are reaching the end of this book, it is time to see how to export data. After all, why did you learn all this if it was just to keep data within Spark, right? I know, I do appreciate learning as a hobby, but it is even better when you can actually bring some business value, right?
This chapter is divided into three sections. The first section covers exporting data. As usual, you will use a real dataset, ingest it, and then export it. You will impersonate a NASA scientist and start exploiting data coming from satellites. Those datasets can be used to prevent wildfires. This is the first step of using code for good! In this section, you will also see the impact of partitioning on exporting data.
In the second part of this chapter, you will experiment with Delta Lake, a database that sits within the core of Spark. Delta Lake can radically simplify your data pipeline, and you will see how and why.