9 Controlling costs
This chapter covers
- Understanding Snowflake costs
- Sizing virtual warehouses
- Using persisted query results
- Optimizing query performance to reduce spilling
- Optimizing performance with data caching
- Reducing query queuing
- Monitoring compute consumption
Data engineers must understand how Snowflake incurs costs so they can build cost-effective data pipelines. Cost and performance are often intertwined. In some cases, better performance may result in a higher cost. For example, upgrading a virtual warehouse to a larger size means better performance, but the cost of this can quickly add up. In other cases, such as long-running queries, poor performance could result in a higher cost because the query causes the virtual warehouse to be active longer. One of the responsibilities of data engineers is to monitor warehouse consumption and use the findings to strike a good balance between improving performance and controlling costs.
In this chapter, we will write queries using large amounts of data from the Snowflake Marketplace. We will discover what contributes to Snowflake’s cost and how to monitor credit consumption. We will explain Snowflake virtual warehouses and how we can resize them to optimize performance. We will learn to minimize spilling during query execution and use persisted query results. We will also describe strategies to reduce query queuing and concurrently running queries.