chapter four

4 Transforming data

This chapter covers

Ingesting semistructured data from cloud storage
Flattening semistructured data into relational tables
Encapsulating transformations with stored procedures
Implementing exception handling and logging in stored procedures
Building robust data pipelines

In this chapter, we will enhance the data pipeline that ingests data from cloud storage, which we built in chapter 3. We will add more functionality in the data transformation part of the pipeline. We will also add exception handling and logging, which will aid in building a robust data pipeline that is as resistant to unintentional errors as possible.

The first type of transformation we will add to the pipeline is flattening semistructured data in JSON format into relational tables. Then we will perform additional data transformations using stored procedures. We will add the data transformation steps to the data pipeline.

Next we will enhance the pipeline by adding exception handling and logging. In case of any errors or incomplete executions, we want to ensure that the data pipeline is robust so that it can be restarted and that it will not duplicate data when executed a second time after fixing an error.

4.1 Ingesting semistructured data from cloud storage

4.1.1 Creating a storage integration

4.1.2 Creating an external stage

4 Transforming data

This chapter covers

4.1 Ingesting semistructured data from cloud storage

4.1.1 Creating a storage integration

4.1.2 Creating an external stage

4.1.3 Examining the JSON structure

4.1.4 Ingesting JSON data into a VARIANT data type

4.2 Flattening semistructured data into relational tables

4.3 Encapsulating transformations with stored procedures

4.3.1 Creating a basic stored procedure

4.3.2 Including a return value in a stored procedure

4.3.3 Implementing exception handling in stored procedures

4.4 Adding logging to stored procedures

4.5 Building robust data pipelines

Summary