Simplify Airflow DAG Creation and Upkeep with Hamilton in 8 minutes | by Stefan Krawczyk

How Hamilton may also help you write extra maintainable Airflow DAGs

An summary illustration of how Airflow & Hamilton relate. Airflow helps carry all of it collectively, whereas Hamilton helps make the innards manageable. Picture from Pixabay.

This put up is written in collaboration with Thierry Jean and initially appeared here.

This put up walks you thru the advantages of getting two open supply initiatives, Hamilton and Airflow, and their directed acyclic graphs (DAGs) work in tandem. At a excessive degree Airflow is chargeable for orchestration (suppose macro) and Hamilton helps creator clear and maintainable knowledge transformations (suppose micro).

For these which can be unfamiliar with Hamilton, we level you to an interactive overview on tryhamilton.dev, or our different posts, e.g. like this one. In any other case we are going to discuss Hamilton at a excessive degree and level to reference documentation for extra particulars. For reference I’m one of many co-creators of Hamilton.

For these nonetheless mentally making an attempt to understand how the 2 can run collectively, the explanation you possibly can run Hamilton with Airflow, is that Hamilton is only a library with a small dependency footprint, and so one can add Hamilton to their Airflow setup very quickly!

Simply to recap, Airflow is the business commonplace to orchestrate knowledge pipelines. It powers all types of knowledge initiatives together with ETL, ML pipelines and BI. Since its inception in 2014, Airflow customers have confronted sure tough edges with reference to authoring and sustaining knowledge pipelines:

Maintainably managing the evolution of workflows; what begins easy can invariably get advanced.
Writing modular, reusable, and testable code that runs inside an Airflow job.
Monitoring lineage of code and knowledge artifacts that an Airflow DAG produces.

That is the place we consider Hamilton may also help! Hamilton is a Python micro-framework for writing knowledge transformations. Briefly, one writes python capabilities in a “declarative” fashion, which Hamilton parses and connects right into a graph based mostly on their names, arguments and sort annotations. Particular outputs may be requested and Hamilton will execute the required perform path to supply them. As a result of it doesn’t present macro orchestrating capabilities, it pairs properly with Airflow by serving to knowledge professionals write cleaner code and extra reusable code for Airflow DAGs.

The Hamilton Paradigm in an image. This instance reveals how one would map procedural pandas code to Hamilton capabilities that outline a DAG. Notice: Hamilton can be utilized for any Python object varieties, not simply Pandas. Picture by creator.

A typical use of Airflow is to assist with machine studying/knowledge science. Operating such workloads in manufacturing usually requires advanced workflows. A needed design determination with Airflow is figuring out find out how to break up the workflow into Airflow duties. Create too many and also you enhance scheduling and execution overhead (e.g. transferring a lot of knowledge), create too few and you’ve got monolithic duties that may take some time to run, however most likely is extra environment friendly to run. The trade-off right here is Airflow DAG complexity versus code complexity inside every of the duties. This makes debugging and reasoning concerning the workflow more durable, particularly for those who didn’t creator the preliminary Airflow DAG. As a rule, the preliminary job construction of the Airflow DAG turns into mounted, as a result of refactoring the duty code turns into prohibitive!

Whereas easier DAGs equivalent to A->B->C are fascinating, there’s an inherent rigidity between the construction’s simplicity and the quantity of code per job. The extra code per job, the tougher it’s to establish factors of failure, on the trade-off of potential computational efficiencies, however within the case of failures, retries develop in price with the “measurement” of the duty.

Airflow DAG construction selections: what number of duties? how a lot code per job? Picture by creator.

As an alternative, what for those who might concurrently wrangle the complexity inside an Airflow job, irrespective of the dimensions of code inside it, and acquire the pliability to simply change the Airflow DAG form with minimal effort? That is the place Hamilton is available in.

With Hamilton you possibly can substitute the code inside every Airflow job with a Hamilton DAG, the place Hamilton handles the “micro” orchestration of the code throughout the job. Notice: Hamilton really lets you logically mannequin all the things that you simply’d need an Airflow DAG to do. Extra on that beneath.

To make use of Hamilton, you load a Python module that incorporates your Hamilton capabilities, instantiate a Hamilton Driver and execute a Hamilton DAG inside an Airflow job in a couple of strains of code. By utilizing Hamilton, you possibly can write your knowledge transformation at an arbitrary granularity, permitting you to examine in larger particulars what every Airflow job is doing.

Particularly the mechanics of the code are:

Import your perform modules
Go them to the Hamilton driver to construct the DAG.
Then, name Driver.execute() with the outputs you need to execute from the DAG you’ve outlined.

Let’s have a look at some code that create a single node Airflow DAG however makes use of Hamilton to coach and consider a ML mannequin:

Simplify Airflow DAG Creation and Upkeep with Hamilton in 8 minutes | by Stefan Krawczyk | Jul, 2023

How Hamilton may also help you write extra maintainable Airflow DAGs

New Technology Revolutionizes Insect Research

Open Source AI Has Founders—and the FTC—Buzzing

You Don't Understand AI Until You Watch THIS

Think Deepfakes Aren’t a Risk? Check Out This AI Video of Biden Flinging Slurs at His Enemies

Leak Shows That Google-Funded AI Video Generator Runway Was Trained on Stolen YouTube Content, Pirated Films

Study Finds That AI Is Adding to Employees’ Workload and Burning Them Out

New Technology Revolutionizes Insect Research

Open Source AI Has Founders—and the FTC—Buzzing

Think Deepfakes Aren’t a Risk? Check Out This AI Video of Biden Flinging Slurs at His Enemies

Leak Shows That Google-Funded AI Video Generator Runway Was Trained on Stolen YouTube Content, Pirated Films

Study Finds That AI Is Adding to Employees’ Workload and Burning Them Out

When AI Is Trained With AI-Generated Data, It Starts Spouting Gibberish

Bind AI Copilot (www.getbind.co)

Forensic Analysis Finds Overwhelming Similarities Between OpenAI’s Voice and Scarlett Johansson

WriteText.ai for WooCommerce (writetext.ai)

World’s Largest Radiology AI Marketplace CARPL Raises $6 Million to Accelerate the Adoption of AI in Clinical Workflows

Google for Startups Accelerator: AI First MENA-T

9 Guidelines for Working Rust on the Internet and on Embedded | by Carl M. Kadie | Jul, 2023

LangChain: Enable LLMs to work together together with your code | by Marcello Politi | Jul, 2023

How Hamilton may also help you write extra maintainable Airflow DAGs

Log In

With social network:

Or with username:

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections