3 Best practices for data staging
This chapter covers
- Creating external stages to ingest data files from cloud storage
- Viewing stage metadata with directory tables
- Preparing data files for efficient ingestion
- Querying data in external stages with external tables
- Using materialized views to improve query performance
In this chapter, we will build upon the first data pipeline we created in chapter 2. We will add more functionality in the data ingestion step by accessing files from cloud storage rather than the local file system. We will authenticate to the cloud storage provider and create external stages.
Because requirements for data pipelines vary, there is no one-size-fits-all approach to building data pipelines. We will demonstrate and compare different options for ingesting data from cloud storage, including using the COPY
command on the external stage, creating external tables, and creating materialized views to improve query performance. We will also review how to prepare data files in cloud storage for efficient ingestion.
Our first step will be to create an external stage exposing files from cloud storage to Snowflake. In this chapter, we will continue to ingest data from CSV files. In addition to CSV, Snowflake can ingest many structured and semistructured file formats, such as JSON, Parquet, or XML, which we will discuss in the next chapter.