4 Templating tasks using the Airflow context

 

This chapter covers

  • Rendering variables at runtime with templating
  • Variable templating with the PythonOperator versus other operators
  • Rendering templated variables for debugging purposes
  • Performing operations on external systems

In the previous chapters, we touched the surface of how DAGs and operators work together and how to schedule a workflow in Airflow. In this chapter, we look in-depth at what operators represent, what they are, how they function, and when and how they are executed. We also demonstrate how operators can be used to communicate with remote systems via hooks, which allows you to perform tasks such as loading data into a database, running a command in a remote environment, and performing workloads outside of Airflow.

4.1 Inspecting data for processing with Airflow

Throughout this chapter, we will work out several components of operators with the help of a (fictitious) stock market prediction tool that applies sentiment analysis, which we’ll call StockSense. Wikipedia is one the largest public information resources on the internet. Besides the wiki pages, other items such as pageview counts are also publicly available. For the purposes of this example, we will apply the axiom that an increase in a company’s pageviews shows a positive sentiment, and the company’s stock is likely to increase. On the other hand, a decrease in pageviews tells us a loss in interest, and the stock price is likely to decrease.

4.1.1 Determining how to load incremental data

4.2 Task context and Jinja templating

4.2.1 Templating operator arguments

4.2.2 What is available for templating?

4.2.3 Templating the PythonOperator

4.2.4 Providing variables to the PythonOperator

4.2.5 Inspecting templated arguments

4.3 Hooking up other systems

Summary

sitemap