10 Merging, joining, and concatenating

 

This chapter covers

  • Concatenating DataFrames on the vertical and horizontal axes
  • Merging DataFrames with inner joins, outer joins, and left joins
  • Finding unique and shared values between DataFrames
  • Joining DataFrames by index labels

As a business domain grows in complexity, it becomes increasingly difficult to store all data in a single collection. To solve this problem, data administrators split data across multiple tables. Then they associate the tables with one another so it is easy to identify the relationships among them.

You may have previously worked with a database such as PostgreSQL, MySQL, or Oracle. Relational database management systems (RDBMS) follow the paradigm described in the preceding paragraph. A database consists of tables. A table holds records for one domain model. A table consists of rows and columns. A row stores information for one record. A column stores an attribute for that record. Tables connect through column keys. If you haven’t worked with databases before, you can consider a table to be effectively equivalent to a pandas DataFrame.

10.1 Introducing the data sets

10.2 Concatenating data sets

10.3 Missing values in concatenated DataFrames

10.4 Left joins

10.5 Inner joins

10.6 Outer joins

10.7 Merging on index labels

10.8 Coding challenge

10.8.1 Problems

10.8.2 Solutions

Summary