1 An introduction to DuckDB

This chapter covers

Why DuckDB, a single node in-memory database, emerged in the era of big data
DuckDB’s capabilities
How DuckDB works and fits into your data pipeline

We’re excited that you’ve picked up this book and are ready to learn about a technology that seems to go against the grain of everything that we’ve learned about big data systems over the last decade. We’ve had a lot of fun using DuckDB and we hope you will be as enthused as we are after reading this book. This book’s approach to teaching is hands-on, concise, fast-paced, and will include lots of code examples.

After reading the book you should be able to use DuckDB to analyze tabular data in a variety of formats. You will also have a new handy tool in your toolbox for data transformation, cleanup and conversion. You can integrate it into your Python notebooks and processes to replace Pandas DataFrames in situations where they are not performing. You will be able to build quick applications for data analysis using Streamlit with DuckDB.

Let’s get started!

1.1 What is DuckDB?

DuckDB is a modern embedded analytics database that runs on your machine and lets you efficiently process and query gigabytes of data from different sources. It was created in 2018 by Mark Raasveldt and Hannes Mühleisen who, at the time, were researchers in database systems at Centrum Wiskunde & Informatica (CWI) - the national research institute for mathematics and computer science in the Netherlands.

1.2 Why should you care about DuckDB?

1.3 When should you use DuckDB?

1 An introduction to DuckDB

This chapter covers

1.1 What is DuckDB?

1.2 Why should you care about DuckDB?

1.3 When should you use DuckDB?

1.4 When should you not use DuckDB?

1.5 Use cases

1.6 Where does DuckDB fit in?

1.7 Steps of the data processing flow

1.7.1 Data Formats and Sources

1.7.2 Data structures

1.7.3 Develop the SQL

1.7.4 Use or process the results

1.8 Summary

1 An introduction to DuckDB

This chapter covers

1.1 What is DuckDB?

1.2 Why should you care about DuckDB?

1.3 When should you use DuckDB?

1.4 When should you not use DuckDB?

1.5 Use cases

1.6 Where does DuckDB fit in?

1.7 Steps of the data processing flow

1.7.1 Data Formats and Sources

1.7.2 Data structures

1.7.3 Develop the SQL

1.7.4 Use or process the results

1.8 Summary

Unable to load book!