6 Integrating with the Python ecosystem

 

This chapter covers

  • The differences between DuckDB’s implementation of Python DB-API 2.0 and the DuckDB relational API
  • Ingesting data from pandas DataFrames, Apache Arrow Tables and more via the Python API
  • Querying pandas DataFrames with DuckDB methods
  • Exporting data to various DataFrames formats and Apache Arrow Tables
  • Using DuckDB’s relational API to compose queries

Up until now, we’ve consistently used the DuckDB CLI to manage and execute our queries. This tool is highly effective for on-the-spot analysis and for CLI-based pipelines. Many data workflows, however, involve Python and its ecosystem to a large extent. For example, pandas DataFrames is one of the things you can’t ignore. In this chapter we will learn that DuckDB’s Python API goes way beyond just implementing the Python DB-API. DuckDB’s Python API will let you not only use the embedded database in your Python process, but also let you query Python objects as if they are tables. At the same time, you can easily convert results from queries into DataFrames.

6.1 Getting started

 
 
 
 

6.1.1 Installing the Python package

 
 
 

6.1.2 Opening up a database connection

 
 

6.2 Using the relational API

 
 

6.2.1 Ingesting CSV data with the Python API

 
 
 
 

6.2.2 Composing queries

 
 
 
 

6.2.3 SQL querying

 
 

6.3 Querying pandas DataFrames

 
 
 

6.4 User-Defined functions

 
 

6.5 Interoperability with Apache Arrow and Polars

 
 
 
 

6.6 Summary

 
 
 
 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest