concept kubernetespodoperator in category apache airflow

This is an excerpt from Manning's book Data Pipelines with Apache Airflow MEAP V05.
Some people prefer to rely on generic operators such as the built-in DockerOperator and the KubernetesPodOperator to execute their tasks. An advantage of this approach is that you can keep your Airflow installation super-lean, as Airflow is only coordinating containerized jobs - you can keep all dependencies of specific tasks with the container. We’ll focus this approach further in a future chapter.
Besides running Airflow itself in containers, Airflow allows you to run your tasks as containers. In practice, this means that you can use container-based operators (such as the DockerOperator and the KubernetesPodOperators) to define tasks. These operators will, when executed, start running a container and wait for the container to finish running whatever it was supposed to do (similar to docker run).
To start running our tasks on Kubernetes, we need to replace our DockerOperators with the instances of the KubernetesPodOperator. As the name implies, the KubernetesPodOperator runs tasks within pods on a Kubernetes cluster. The basic API of the operator is as follows:
Listing 11.31.
fetch_ratings = KubernetesPodOperator( task_id="fetch_ratings", image="airflowbook/movielens-fetch", #A cmds=["fetch-ratings"], #B arguments=[ #C "--start_date", "{{ds}}", "--end_date", "{{next_ds}}", "--output_path", "/data/ratings/{{ds}}.json", "--user", os.environ["MOVIELENS_USER"], "--password", os.environ["MOVIELENS_PASSWORD"], "--host", os.environ["MOVIELENS_HOST"], ], namespace="airflow", #D name="fetch-ratings", #E cluster_context="docker-desktop", #F volumes=[volume], #G volume_mounts=[volume_mount], )Similar to the DockerOperator, the first few arguments tell the KubernetesPodOperator how to run our task as a container: the image argument tells Kubernetes which Docker image to use, whilst the cmds and arguments parameters define which executable to run (fetch-ratings) and which arguments to pass to the executable. The remaining arguments tell Kubernetes which cluster to use (cluster_context), in which namespace to run the pod (namespace) and what name to use for the container (name).
Figure 11.10 - Several successful of the recommender DAG based on the KubernetesPodOperator.
![]()