Introduction to Apache Airflow

Apache Airflow (or simply Airflow) is a platform to programmatically author, schedule, and monitor workflows. It is a must-have tool for Data Engineers. It helps us easily schedule and run our complex data pipelines. It will make sure that each task of our data pipeline will get executed in the correct order and each task gets the required resources. Airflow lets you author workflows as directed acyclic graphs (DAGs) of tasks. The Airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.