concept PythonOperator in category apache airflow

appears as: PythonOperator, The PythonOperator, PythonOperator
Data Pipelines with Apache Airflow MEAP V05

This is an excerpt from Manning's book Data Pipelines with Apache Airflow MEAP V05.

The PythonOperator in Airflow is responsible for running any Python code. Just like the BashOperator used before, this and all other operators require a task_id. The task_id is referenced when running a task and displayed in the UI. The use of a PythonOperator is always twofold:

4.2.3   Templating the PythonOperator

The PythonOperator is an exception to the templating shown in the previous section. With the BashOperator (and all other operators in Airflow), you provide a string to the bash_command argument (or whatever the argument is named in other operators), which is automatically templated at runtime. The PythonOperator is an exception to this standard, because it doesn’t take arguments which can be templated with the runtime context, but instead a python_callable argument in which the runtime context can be applied.

Let’s inspect the code downloading the Wikipedia pageviews as shown above with the BashOperator, but now implemented with the PythonOperator. Functionally this results in the same behaviour:

Listing 4.5 Downloading Wikipedia pageviews with the PythonOperator
from urllib import request
 
import airflow
from airflow import DAG
from airflow.operators.python_operator import PythonOperator
 
dag = DAG(dag_id="stocksense", start_date=airflow.utils.dates.days_ago(1), schedule_interval="@hourly")
 
 
def _get_data(execution_date, **_): #A
   year, month, day, hour, *_ = execution_date.timetuple()
   url = (
       "https://dumps.wikimedia.org/other/pageviews/"
       f"{year}/{year}-{month:0>2}/pageviews-{year}{month:0>2}{day:0>2}-{hour:0>2}0000.gz"
   )
   output_path = "/tmp/wikipageviews.gz"
   request.urlretrieve(url, output_path)
 
 
get_data = PythonOperator(task_id="get_data", python_callable=_get_data, provide_context=True, dag=dag) #A
 

Functions are first class citizens in Python and we provide a callable[11] (a function is a callable object) to the python_callable argument of the PythonOperator. On execution, the PythonOperator executes the provided callable, which could be any function. Since it is a function and not a string as with all other operators, the code within the function cannot be automatically templated.

Instead, the task context variables can be provided as variables, to be used in the given function. There is one side note: we must set an argument provide_context=True in order to provide the task instance context. Running the PythonOperator without setting provide_context=True will execute the callable fine but no task context variables will be passed to the callable function.

Figure 4.4 Providing task context with a PythonOperator
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest