5 Continuous data ingestion
This chapter covers
- Comparing bulk and continuous data ingestion
- Introducing Snowpipe
- Configuring Snowpipe with cloud messaging
- Using and monitoring Snowpipe
- Transforming data continuously with Snowflake dynamic tables
In this chapter, we will build a new data pipeline that continuously ingests data from files whenever they appear in the external cloud storage with minimum delay. We will explain the difference between continuous and bulk data ingestion, which we described in the previous chapters. We will introduce Snowpipe as the Snowflake feature used in data pipelines for continuous data ingestion. Finally, we will use dynamic tables to perform data transformation continuously.
We will build a data pipeline that uses Snowpipe to ingest data from JSON files stored in an external cloud storage location. We will convert the data from JSON format to a relational format. Instead of executing stored procedures as in chapter 4, we will materialize the data by creating dynamic tables.
To illustrate the examples used to build the pipeline in this chapter, we will continue working with the fictional bakery introduced in chapter 2. To briefly recap, the bakery makes bread and pastries and delivers these baked goods to small businesses, such as grocery stores, coffee shops, and restaurants in the neighborhood.