concept PythonOperator in category apache airflow

This is an excerpt from Manning's book Data Pipelines with Apache Airflow MEAP V05.
The PythonOperator in Airflow is responsible for running any Python code. Just like the BashOperator used before, this and all other operators require a task_id. The task_id is referenced when running a task and displayed in the UI. The use of a PythonOperator is always twofold:
4.2.3 Templating the PythonOperator
The PythonOperator is an exception to the templating shown in the previous section. With the BashOperator (and all other operators in Airflow), you provide a string to the bash_command argument (or whatever the argument is named in other operators), which is automatically templated at runtime. The PythonOperator is an exception to this standard, because it doesn’t take arguments which can be templated with the runtime context, but instead a python_callable argument in which the runtime context can be applied.
Let’s inspect the code downloading the Wikipedia pageviews as shown above with the BashOperator, but now implemented with the PythonOperator. Functionally this results in the same behaviour:
Listing 4.5 Downloading Wikipedia pageviews with the PythonOperator
from urllib import request import airflow from airflow import DAG from airflow.operators.python_operator import PythonOperator dag = DAG(dag_id="stocksense", start_date=airflow.utils.dates.days_ago(1), schedule_interval="@hourly") def _get_data(execution_date, **_): #A year, month, day, hour, *_ = execution_date.timetuple() url = ( "https://dumps.wikimedia.org/other/pageviews/" f"{year}/{year}-{month:0>2}/pageviews-{year}{month:0>2}{day:0>2}-{hour:0>2}0000.gz" ) output_path = "/tmp/wikipageviews.gz" request.urlretrieve(url, output_path) get_data = PythonOperator(task_id="get_data", python_callable=_get_data, provide_context=True, dag=dag) #AFunctions are first class citizens in Python and we provide a callable[11] (a function is a callable object) to the python_callable argument of the PythonOperator. On execution, the PythonOperator executes the provided callable, which could be any function. Since it is a function and not a string as with all other operators, the code within the function cannot be automatically templated.
Instead, the task context variables can be provided as variables, to be used in the given function. There is one side note: we must set an argument provide_context=True in order to provide the task instance context. Running the PythonOperator without setting provide_context=True will execute the callable fine but no task context variables will be passed to the callable function.
Figure 4.4 Providing task context with a PythonOperator
![]()