chapter eleven

11 Designing data pipelines

This chapter covers

Designing data pipelines
Comparing data pipeline patterns
Choosing data transformation layers
Defining role-based access control
Building a sample data pipeline

A data pipeline is an automated sequence of steps designed to support the extraction, movement, ingestion, transformation, storage, and presentation of data from a source to a target platform. Data engineers must design data pipelines before writing code to implement them. To create a sound design, they must understand the purpose of the data pipeline, identify its data sources and targets, decide on a data pipeline pattern, choose the appropriate data transformation layers, and consider other user requirements, such as data governance and security.

In this chapter, we will design a data pipeline that ingests data from multiple sources. We will compare data pipeline patterns, including ETL (extract-transform-load), ELT (extract-load-transform), and ETLT (extract-transform-load-transform). We will choose the data transformation layers, such as extract, staging, data warehouse, or presentation. We will set up role-based access control so that only authorized users can access data in the various layers of the pipeline.

11.1 Designing data pipelines

11.1.1 Extracting data

11.1.2 Comparing data pipeline patterns

11 Designing data pipelines

This chapter covers

11.1 Designing data pipelines

11.1.1 Extracting data

11.1.2 Comparing data pipeline patterns

11.1.3 Choosing data transformation layers

11.1.4 Organizing data warehouse layers

11.1.5 Creating schemas with access control

11.2 Building a sample data pipeline

11.2.1 Implementing the extraction layer

11.2.2 Implementing the staging layer

11.2.3 Implementing the data warehouse layer

11.2.4 Implementing the reporting layer

Summary