chapter nine

9 Controlling costs

This chapter covers

Understanding Snowflake costs
Sizing virtual warehouses
Using persisted query results
Optimizing query performance to reduce spilling
Optimizing performance with data caching
Reducing query queuing
Monitoring compute consumption

Data engineers must understand how Snowflake incurs costs so they can build cost-effective data pipelines. Cost and performance are often intertwined. In some cases, better performance may result in a higher cost. For example, upgrading a virtual warehouse to a larger size means better performance, but the cost of this can quickly add up. In other cases, such as long-running queries, poor performance could result in a higher cost because the query causes the virtual warehouse to be active longer. One of the responsibilities of data engineers is to monitor warehouse consumption and use the findings to strike a good balance between improving performance and controlling costs.

In this chapter, we will write queries using large amounts of data from the Snowflake Marketplace. We will discover what contributes to Snowflake’s cost and how to monitor credit consumption. We will explain Snowflake virtual warehouses and how we can resize them to optimize performance. We will learn to minimize spilling during query execution and use persisted query results. We will also describe strategies to reduce query queuing and concurrently running queries.

9.1 Understanding Snowflake costs

9.1.1 Total Snowflake cost

9.1.2 Compute resources cost

9.1.3 Virtual warehouse credits

9.2 Sizing virtual warehouses

9.2.1 Using persisted query results

9.2.2 Comparing query statistics between differently sized warehouses

9.2.3 Optimizing query performance to reduce spilling

9.3 Optimizing performance with data caching

9.3.1 Illustrating the metadata cache

9.3.2 Utilizing the warehouse cache efficiently

9.4 Reducing query queuing

9.4.1 Examining queuing

9.4.2 Limiting concurrently running queries

9.5 Monitoring compute consumption

Summary