chapter fifteen

15 Data pipeline continuous integration

This chapter covers

Separating the data engineering environments
Database change management
Configuring Snowflake to use Git
Using the Snowflake CLI command line interface
Connecting to Snowflake securely

In previous chapters, we gradually built data pipelines by adding various pieces of functionality. As our knowledge expanded, we created many scripts and files, saving them across multiple chapter folders in the accompanying GitHub repository. This has made it challenging to locate a specific script for maintenance. A more practical solution for organizing the data pipeline code would be to store the scripts in an organized manner in the repository. A centralized code repository is essential when multiple data engineers work on the same data pipelines, allowing them to locate scripts effortlessly and merge their code changes into the shared codebase.

15.1 Separating the data engineering environments

15.2 Database change management

15.2.1 Comparing the imperative and the declarative approach to DCM

15.2.2 Organizing the code in the repository

15.3 Configuring Snowflake to use Git

15.3.1 Creating a Git repository stage

15.3.2 Executing commands from a Git repository stage

15.4 Using the Snowflake CLI command line interface

15.4.1 Installing and configuring Snowflake CLI

15.4.2 Executing scripts with Snowflake CLI

15.4.3 Continuous integration with Snowflake CLI

15.5 Connecting to Snowflake securely

15.5.1 Configuring key-pair authentication

15.6 Applying what we learned in real-world scenarios

Summary