chapter four

4 Breaking down a DAG

 

This chapter covers

  • Rendering variables at runtime with templating
  • What data is available for templating
  • How templating differs between the PythonOperator and all other operators
  • Performing operations on external systems

In the previous chapters, we touched the surface of how a DAG and operators work together and how scheduling a workflow works in Airflow. In this chapter we aim to go more in depth in understanding what operators represent, what they are and how they function, and when and how they are executed. Besides this, we demonstrate how operators can be used to communicate with remote systems via hooks, which allows you to perform tasks such as loading data into a database, running a command in a remote environment and submitting a Spark job to a YARN cluster.

4.1       Predicting stock popularity with sentiment analysis

4.1.1      Downloading Wikipedia pageviews

4.2       Task context & Jinja templating

4.2.1      Templating operator arguments

4.2.2      Templating the PythonOperator

4.2.3      Providing variables to the PythonOperator

4.2.4      Inspecting templated arguments

4.3       Hooking up other systems

4.4       Summary