chapter five

5 Templating tasks using the Airflow context

 

This chapter covers

  • Rendering variables at runtime with templating
  • Master variable templating with the PythonOperator and across all other operators
  • Rendering templated variables for debugging purposes
  • Performing operations on external systems

Static data pipelines are hardly useful if they always perform the same operations and can’t adapt to changes between executions (e.g. loading data for a given day). Previously, we’ve already seen some examples of how Airflow allows you to make your pipelines more dynamic by referencing the execution date of a DAG. Where we skipped over it before, we’ll now dive a bit deeper into how this ‘templating’ functionality works.

5.1 Inspecting data for processing with Airflow

Throughout this chapter, we will work out several components of operators with the help of a (fictitious) stock market prediction tool that applies sentiment analysis, which we’ll call StockSense. Wikipedia is one the largest public information resources on the internet. Besides the wiki pages, other items such as pageview counts are also publicly available. For the purposes of this example, we will apply the axiom that an increase in a company’s pageviews shows a positive sentiment, and the company’s stock is likely to increase. On the other hand, a decrease in pageviews tells us of a loss in interest, and the stock price is likely to decrease.

5.1.1 Determining how to load incremental data

5.2 Task context and Jinja templating

5.2.1 Templating operator arguments

5.2.2 Templating the PythonOperator

5.2.3 Passing additional variables to the PythonOperator

5.2.4 Inspecting templated arguments

5.2.5 What is available for templating?

5.3 Bringing it all together

5.4 Summary