concept idempotent task in category apache airflow

appears as: An idempotent task, idempotent tasks
Data Pipelines with Apache Airflow MEAP V05

This is an excerpt from Manning's book Data Pipelines with Apache Airflow MEAP V05.

Figure 3.9 An idempotent task produces the same result, no matter how many times you run it. Idempotency ensures consistency and ability to deal with failure.

Airflow now triggers jobs downloading, transforming, and storing data in the Postgres database at 15 minute intervals. For a real user-facing application you probably want a better looking and searchable front-end, but from the back-end perspective we now have an automated data pipeline, automatically running at 15 minute intervals, showing whether a taxi or Citi Bike is faster between given locations at given times. To recap; as mentioned in Section 15.3, the pipeline applies one single operator able to run various transformation tasks, instead of resorting to re-implementing and re-applying the PythonOperator, which results concise code and no duplication. In Section 15.4 we elaborate on how to structure a data pipeline - by persisting intermediate results we create resumable pipelines. Lastly, in Section 15.5 we discuss a challenge in developing idempotent tasks and demonstrate how to solve it by example.

sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest