9 Practical data analysis

 

This chapter covers

  • Using statistical tools: sum, average, standard deviation, and frequency distributions
  • Grouping and summarizing a data set to make sense of it
  • Using tools for working with time series data: rolling average, linear regression, and more
  • Using data analysis techniques for comparing data and making predictions
  • Using correlation to understand the relationship between data variables

Congratulations, you made it to the data analysis chapter. It took much work to get here. We’ve had to fetch our data from somewhere, and we had to clean and prepare it. Then it turned out that we had more data than we could deal with, so we had to move it to our database to deal with it. It’s been a long road.

Data analysis is the study of our data for better understanding, to glean insights, and answer the questions that we have. For instance, when I’m searching for a place to live or to visit on vacation, I might have specific requirements for the weather. In this chapter, we’ll study 100 years’ worth of weather data from a weather station in New York City’s Central Park. Later, we’ll compare it to the weather in Los Angeles and see how it stacks up. I’m also interested in the overall trend: Is it getting hotter? Which city is heating up more quickly?

9.1 Expanding your toolkit

9.2 Analyzing the weather data

9.3 Getting the code and data

9.4 Basic data summarization

9.4.1 Sum

9.4.2 Average

9.4.3 Standard deviation

9.5 Group and summarize

9.6 The frequency distribution of temperatures

9.7 Time series

9.7.1 Yearly average temperature

9.7.2 Rolling average

9.7.3 Rolling standard deviation

9.7.4 Linear regression

9.7.5 Comparing time series

9.7.6 Stacking time series operations

9.8 Understanding relationships

9.8.1 Detecting correlation with a scatter plot

9.8.2 Types of correlation

9.8.3 Determining the strength of the correlation

9.8.4 Computing the correlation coefficient

Summary