preface
This book covers DuckDB—a modern, fast, embedded analytical database. It runs on your machine and can easily process many gigabytes of data from a variety of sources, including JSON, CSV, Parquet, SQLite, and Postgres. DuckDB integrates well into the Python and R ecosystems and allows you to query in-memory data frames without copying the data. You don’t need to spin up cloud data warehouses for your day-to-day data processing anymore; you can just run DuckDB on your data, locally or in the cloud.
With DuckDB, you can solve your relational data analytics tasks without friction. It is really user friendly and easy to learn. Best of all, you can use it embedded in your Python environments and applications, much like SQLite. We strongly believe that we hit the sweet spot in teaching DuckDB, covering its CLI-embedded mode, Python integrations, and capabilities for building data pipelines as well as processing data—all while also guiding readers through a painless deep-dive into modern SQL with DuckDB.
While we all are longtime data expert practitioners and educators, we come from different corners of this spectrum—graph, real-time columnar, and relational databases—yet we all find something of value in DuckDB that we think is worth speaking about. We enjoy using DuckDB a lot, both outside our expertise but also as a useful tool in our respective areas of work.