15 Data pipeline continuous integration

 

This chapter covers

  • Separating the data engineering environments
  • Database change management
  • Configuring Snowflake to use Git
  • Using the Snowflake CLI command line interface
  • Connecting to Snowflake securely

In previous chapters, we gradually built data pipelines by adding various pieces of functionality. As our knowledge expanded, we created many scripts and files, saving them across multiple chapter folders in the accompanying GitHub repository. This has made it challenging to locate a specific script for maintenance. A more practical solution for organizing the data pipeline code would be to store the scripts in an organized manner in the repository. A centralized code repository is essential when multiple data engineers work on the same data pipelines, allowing them to locate scripts effortlessly and merge their code changes into the shared codebase.

15.1 Separating the data engineering environments

15.2 Database change management

15.2.1 Comparing the imperative and the declarative approach to DCM

15.2.2 Organizing the code in the repository

15.3 Configuring Snowflake to use Git

15.3.1 Creating a Git repository stage

15.3.2 Executing commands from a Git repository stage

15.4 Using the Snowflake CLI command line interface

15.4.1 Installing and configuring Snowflake CLI

15.4.2 Executing scripts with Snowflake CLI

15.4.3 Continuous integration with Snowflake CLI

15.5 Connecting to Snowflake securely

15.5.1 Configuring key-pair authentication

15.6 Applying what we learned in real-world scenarios

Summary