chapter one

1 An introduction to DuckDB

This chapter covers

Why DuckDB, a single node in-memory database, emerged in the era of big data
DuckDB’s capabilities
How DuckDB works and fits into your data pipeline

We’re excited that you’ve picked up this book and are ready to learn about a technology that seems to go against the grain of everything that we’ve learned about big data systems over the last decade. We’ve had a lot of fun using DuckDB, and we hope you will be as enthused as we are after reading this book. This book’s approach to teaching is hands-on, concise, and fast paced and will include lots of code examples.

After reading the book, you should be able to use DuckDB to analyze tabular data in a variety of formats. You will also have a handy new tool in your toolbox for data transformation, cleanup, and conversion. You can integrate it into your Python notebooks and processes to replace pandas DataFrames in situations where they are not performing. You will be able to build quick applications for data analysis using Streamlit with DuckDB. Let’s get started!

1.1 What is DuckDB?

1.2 Why should you care about DuckDB?

1.3 When should you use DuckDB?

1 An introduction to DuckDB

This chapter covers

1.1 What is DuckDB?

1.2 Why should you care about DuckDB?

1.3 When should you use DuckDB?

1.4 When should you not use DuckDB?

1.5 Use cases

1.6 Where does DuckDB fit in?

1.7 Steps of the data processing flow

1.7.1 Data formats and sources

1.7.2 Data structures

1.7.3 Developing the SQL

1.7.4 Using or processing the results

Summary