DuckDB in Action cover
welcome to this free extract from
an online version of the Manning book.
to read more



Thank you for purchasing the MEAP for DuckDB in Action. We hope that the information you’ll get access to will be of immediate use to you and, with your help, the final book will turn out great!

This book is written for developers who want to do more with less and approach data processing with more lightweight tooling than usual: By using an embedded SQL analytics database.

When we started to work with DuckDB we quickly found a number of incredibly useful applications, from joining CSV files together on the fly or reshaping them in any way, analyzing Parquet data stored in AWS S3 buckets or for serving as analytical backend for quick dashboards. If you work with data, chances are high that you pass tabular data between processes, which is an ideal shape to be pre-processed, filtered or enriched with a fast, embeddable SQL database.

DuckDB has a modern database architecture which is based on parallel processing of optimized vectors which makes it very fast on modern CPUs. So fast, that some queries against files from other databases are faster than the original.

To make the best possible use we focus on two things, essentially

  1. How to integrate DuckDB in your data analytics pipelines—regardless of them being CLI operations in your terminal, data pipelines in the cloud, or your Python applications or notebooks.
  2. What kind of queries can you run with and in DuckDB