5 Exploring data without persistence
This chapter covers
- Converting CSV files to Parquet files
- Auto-inferring file type and data schema
- Creating views to simplify the querying of nested JSON documents
- Exploring the metadata of Parquet files
- Querying other databases, such as SQLite
In this chapter, we’re going to learn how to query data without persisting the data in DuckDB, a technique that is quite unusual for a database and seems counterintuitive, but which is useful in the right situations. For example, if we need to transform data from one format to another, we might not necessarily want to create an intermediate storage model while doing this.
This chapter also demonstrates the power of DuckDB’s analytical engine, even when your data isn’t stored in the native format. We’ll show how to query several common data formats, including JSON, CSV, and Parquet, as well as other databases, such as SQLite.
The JSON and CSV sources we are working with in this chapter are located in the ch05 folder of our example repository on GitHub: https://github.com/duckdb-in-action/examples. We assume you have navigated to the root of this repository before invoking the DuckDB CLI for the examples in this chapter.