6 Integrating with the Python ecosystem
This chapter covers
- The differences between DuckDB’s implementation of Python DB-API 2.0 and the DuckDB relational API
- Ingesting data from pandas DataFrames, Apache Arrow Tables and more via the Python API
- Querying pandas DataFrames with DuckDB methods
- Exporting data to various DataFrames formats and Apache Arrow Tables
- Using DuckDB’s relational API to compose queries
Up until now, we’ve consistently used the DuckDB CLI to manage and execute our queries. This tool is highly effective for on-the-spot analysis and for CLI-based pipelines. Many data workflows, however, involve Python and its ecosystem to a large extent. For example, pandas DataFrames is one of the things you can’t ignore. In this chapter we will learn that DuckDB’s Python API goes way beyond just implementing the Python DB-API. DuckDB’s Python API will let you not only use the embedded database in your Python process, but also let you query Python objects as if they are tables. At the same time, you can easily convert results from queries into DataFrames.