chapter five

5 Templating tasks using the Airflow context

 

This chapter covers

  • Rendering variables at run time with templating
  • Mastering variable templating with the PythonOperator
  • Rendering templated variables for debugging purposes
  • Performing operations on external systems

Static data pipelines are hardly useful if they always perform the same operations and can’t adapt to changes between executions (e.g., loading data for a given day). We’ve seen some examples of how Airflow allows us to make pipelines more dynamic by referencing the execution date of a DAG. In this chapter, we’ll dive a bit deeper into how this templating functionality works.

5.1 Inspecting data for processing with Airflow

Throughout this chapter, we’ll work out several components of operators with the help of a (fictitious) stock-market-prediction tool that applies sentiment analysis. We’ll call this tool StockSense.

Wikipedia is one of the largest public information resources on the internet. In addition to the wiki pages, items such as page-view counts are publicly available. For the examples in this chapter, we’ll apply the axiom that an increase in a company’s page views shows positive sentiment, so the company’s stock is likely to increase, and that a decrease in page views shows loss of interest, so the stock price is likely to decrease.

5.2 Task context and Jinja templating

5.2.1 Templating operator arguments

5.2.2 Templating the PythonOperator

5.2.3 Passing additional variables to the PythonOperator

5.2.4 Inspecting templated arguments

5.3 What is available for templating

5.4 Bringing it all together

Summary