5 Continuous data ingestion

 

This chapter covers

  • Comparing bulk and continuous data ingestion
  • Introducing Snowpipe
  • Configuring Snowpipe with cloud messaging
  • Using and monitoring Snowpipe
  • Transforming data continuously with Snowflake dynamic tables

In this chapter, we will build a new data pipeline that continuously ingests data from files whenever they appear in the external cloud storage with minimum delay. We will explain the difference between continuous and bulk data ingestion, which we described in the previous chapters. We will introduce Snowpipe as the Snowflake feature used in data pipelines for continuous data ingestion. Finally, we will use dynamic tables to perform data transformation continuously.

We will build a data pipeline that uses Snowpipe to ingest data from JSON files stored in an external cloud storage location. We will convert the data from JSON format to a relational format. Instead of executing stored procedures as in chapter 4, we will materialize the data by creating dynamic tables.

To illustrate the examples used to build the pipeline in this chapter, we will continue working with the fictional bakery introduced in chapter 2. To briefly recap, the bakery makes bread and pastries and delivers these baked goods to small businesses, such as grocery stores, coffee shops, and restaurants in the neighborhood.

5.1 Comparing bulk and continuous data ingestion

5.2 Preparing files in cloud storage

5.2.1 Creating a storage integration

5.2.2 Creating an external stage

5.3 Configuring Snowpipe with cloud messaging

5.3.1 Configuring event grid messages for blob storage events

5.3.2 Creating a notification integration

5.3.3 Creating a pipe object

5.3.4 Ingesting data continuously

5.3.5 Flattening the JSON structure to relational format

5.4 Transforming data with dynamic tables

Summary