4 Transforming data
This chapter covers
- Ingesting semistructured data from cloud storage
- Flattening semistructured data into relational tables
- Encapsulating transformations with stored procedures
- Implementing exception handling and logging in stored procedures
- Building robust data pipelines
In this chapter, we will enhance the data pipeline that ingests data from cloud storage, which we built in chapter 3. We will add more functionality in the data transformation part of the pipeline. We will also add exception handling and logging, which will aid in building a robust data pipeline that is as resistant to unintentional errors as possible.
The first type of transformation we will add to the pipeline is flattening semistructured data in JSON format into relational tables. Then we will perform additional data transformations using stored procedures. We will add the data transformation steps to the data pipeline.
Next we will enhance the pipeline by adding exception handling and logging. In case of any errors or incomplete executions, we want to ensure that the data pipeline is robust so that it can be restarted and that it will not duplicate data when executed a second time after fixing an error.