chapter eight

8 First steps with data frames

 

This chapter covers

  • Working with compressed files
  • Reading and writing CSV files, Apache Arrow data, and SQLite databases
  • Getting columns from a data frame
  • Computing summary statistics of data frame contents
  • Visualizing data distribution using histograms

In this chapter you will learn the basic principles of working with data frames in Julia provided by the DataFrames.jl package. Data frame objects are flexible data structures that allow you to work with tabular data. As I explained in chapter 1, tabular data in general, and a data frame in particular, is a two-dimensional structure consisting of cells. Each row has the same number of cells and provides information about one observation of the data. Each column has the same number of cells and stores the information about the same feature across observations and additionally has a name.

After reading part 1, you have acquired essential skills for working with Julia to analyze data. Starting with this chapter, you will learn how to efficiently perform data analysis tasks in Julia. We start with explaining how to work with tabular data, as most of statistical data sets have this form. Therefore, essentially every ecosystem used for doing data science provides a data frame type, for example:

8.1 Fetching, unpacking, and inspecting the data

8.2 Loading the data to a data frame

8.3 Getting a column out of a data frame

8.3.1 Data frame's storage model

8.3.2 Treating a data frame column as a property

8.3.3 Getting a column using data frame indexing

8.3.4 Visualizing data stored in columns of a data frame

8.4 Reading and writing data frames using different formats

8.5 Summary