14 Project: Finding the fastest way to get around NYC

 

This chapter covers

  • Setting up an Airflow pipeline from scratch
  • Structuring intermediate output data
  • Developing idempotent tasks
  • Implementing one operator to handle multiple similar transformations

Transportation in New York City (NYC) can be hectic. It’s always rush hour, but luckily there are more alternative ways of transportation than ever. In May 2013, Citi Bike started operating in New York City with 6,000 bikes. Over the years, Citi Bike has grown and expanded and has become a popular method of transportation in the city.

Another iconic method of transportation is the Yellow Cab taxi. Taxis were introduced in NYC in the late 1890s and have always been popular. However, in recent years the number of taxi drivers has plummeted, and many drivers started driving for ride-sharing services such as Uber and Lyft.

Regardless of what type of transportation you choose in NYC, typically the goal is to go from point A to point B as fast as possible. Luckily the city of New York is very active in publishing data, including rides from Citi Bikes and Yellow Taxis.

14.1 Understanding the data

 
 

14.1.1 Yellow Cab file share

 

14.1.2 Citi Bike REST API

 
 
 
 

14.1.3 Deciding on a plan of approach

 
 

14.2 Extracting the data

 

14.2.1 Downloading Citi Bike data

 
 

14.2.2 Downloading Yellow Cab data

 
 
 
 

14.3 Applying similar transformations to data

 
 
 

14.4 Structuring a data pipeline

 

14.5 Developing idempotent data pipelines

 
 

Summary

 
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest