Part 3 Building data pipelines

 

This part of the book consolidates all your knowledge so far and demonstrates how to build a comprehensive data pipeline that executes on schedule. The chapters in this part progressively build on each other, resulting in a complete data pipeline.

Chapter 11 lays the groundwork by defining the data transformation layers, including extract, staging, data warehouse, and presentation.

Chapter 12 introduces incremental data ingestion, which is faster than full ingestion, as it involves moving and storing less data, resulting in reduced storage and compute costs.

Chapter 13 explains data pipeline orchestration as the process that involves scheduling, defining dependencies, error handling, and sending notifications to ensure efficient execution of data pipeline steps.

In chapter 14, we learn how to conduct data quality tests that validate data integrity and completeness and take remedial measures when test results don’t meet the data quality standards.

Chapter 15 covers continuous integration, a software development practice in which data engineers frequently merge their code changes into the repository. After the merge, automated scripts execute the code, create database objects, perform integration tests, and carry out other necessary actions.

The examples used throughout the book to illustrate data engineering concepts and related Snowflake functionality should provide a solid foundation for continuing your journey in real-world data engineering.