8 First steps with data frames

 

This chapter covers

  • Working with compressed files
  • Reading and writing CSV files, Apache Arrow data, and SQLite databases
  • Getting columns from a data frame
  • Computing summary statistics of data frame contents
  • Visualizing data distribution by using histograms

In this chapter, you will learn the basic principles of working with data frames in Julia provided by the DataFrames.jl package. Data frame objects are flexible data structures that allow you to work with tabular data. As I explained in chapter 1, tabular data in general, and a data frame in particular, is a two-dimensional structure consisting of cells. Each row has the same number of cells and provides information about one observation of the data. Each column has the same number of cells, stores information about the same feature across observations, and also has a name.

After reading part 1, you have acquired essential skills for working with Julia to analyze data. Starting with this chapter, you will learn how to efficiently perform data analysis tasks in Julia. We start with explaining how to work with tabular data, as most statistical data sets have this form. Therefore, essentially every ecosystem used for doing data science provides a data frame type. For example:

8.1 Fetching, unpacking, and inspecting the data

8.1.1 Downloading the file from the web

8.1.2 Working with bzip2 archives

8.1.3 Inspecting the CSV file

8.2 Loading the data to a data frame

8.2.1 Reading a CSV file into a data frame

8.2.2 Inspecting the contents of a data frame

8.2.3 Saving a data frame to a CSV file

8.3 Getting a column out of a data frame

8.3.1 Understanding the data frame’s storage model

8.3.2 Treating a data frame column as a property

8.3.3 Getting a column by using data frame indexing