1 An introduction to DuckDB
This chapter covers
- Why DuckDB, a single node in-memory database, emerged in the era of big data
- DuckDB’s capabilities
- How DuckDB works and fits into your data pipeline
We’re excited that you’ve picked up this book and are ready to learn about a technology that seems to go against the grain of everything that we’ve learned about big data systems over the last decade. We’ve had a lot of fun using DuckDB, and we hope you will be as enthused as we are after reading this book. This book’s approach to teaching is hands-on, concise, and fast paced and will include lots of code examples.
After reading the book, you should be able to use DuckDB to analyze tabular data in a variety of formats. You will also have a handy new tool in your toolbox for data transformation, cleanup, and conversion. You can integrate it into your Python notebooks and processes to replace pandas DataFrames in situations where they are not performing. You will be able to build quick applications for data analysis using Streamlit with DuckDB. Let’s get started!