concept operator in category apache airflow

This is an excerpt from Manning's book Data Pipelines with Apache Airflow MEAP V05.
Note the (lowercase) dag is the name assigned to the instance of the (uppercase) DAG class. The instance name could have any name; you can name it e.g. rocket_dag or whatever_name_you_like. We will reference the variable (lowercase dag) in all operators, which tells Airflow which DAG the operator belongs to.
Figure 2.4 DAGs and Operators are used by Airflow users. Tasks are internal components to manage operator state and display state changes (e.g., started/finished) to the user.
![]()
Airflow provides a considerable number of built-in hooks/operators that allow you to interact with a great number of the AWS services. These allow you to (for example) coordinate processes involving moving and transforming data across the different services, as well as the deployment of any required resources. For an overview of all the available hooks and operators, see the Amazon/Aws provider package[106].
Due to their large number, we won’t go into any details of the AWS-specific hooks and operators but would rather refer you to their documentation. However, tables 13.1 and 13.2 provide a brief overview of several hooks and operators, together with the AWS services they tie into and their respective applications. A demonstration of some of these hooks and operators is also provided in the next section.
Listing 13.7
import datetime as dt import os from os import path import tempfile import pandas as pd from airflow import DAG, utils as airflow_utils from airflow.providers.amazon.aws.hooks.s3 import S3Hook from airflow.providers.amazon.aws.operators.athena import AWSAthenaOperator from airflow.operators.dummy_operator import DummyOperator from airflow.operators.python_operator import PythonOperator from custom.operators import GlueTriggerCrawlerOperator from custom.ratings import fetch_ratings with DAG( dag_id="chapter13_aws_usecase", description="DAG demonstrating some AWS-specific hooks and operators.", start_date=dt.datetime(year=2015, month=1, day=1), end_date=dt.datetime(year=2015, month=3, day=1), schedule_interval="@monthly", default_args={ "depends_on_past": True } ) as dag: upload_ratings = PythonOperator(...) trigger_crawler = GlueTriggerCrawlerOperator(...) rank_movies = AWSAthenaOperator(...) upload_ratings >> trigger_crawler >> rank_movies