in

The Docker Compose of ETL: Meerschaum Compose | by Bennett Meares | Jun, 2023


An exampe Meerschaum Compose file.
An instance Meerschaum Compose venture for ETL on climate information.

Pipes

The common-or-garden pipe is Meerschaum’s abstraction for incremental ETL. Pipes have enter and output connectors and retailer parameters to configure the conduct of their syncing processes. This can be so simple as a SQL question or could embody customized keys to be used in your plugins.

Meerschaum pipes created by Meerschaum Compose.
Pipes from the above Compose venture displayed by the online UI

As a result of pipes’ metadata are saved alongside their tables, they’re simply editable (whether or not through edit pipes or on the net UI), which facilitates prototyping. However this dynamic nature introduces the identical drawback described firstly of this text: with a purpose to scale growth, a Compose file is required to outline a venture’s elements in a means that may be simply version-controlled.

In response to the Meerschaum Compose specification, pipes are outlined in an inventory underneath the keys sync:pipes. Every merchandise defines the keys and parameters wanted to assemble the pipe, like a blueprint for what you count on the pipes within the database to mirror.

For instance, the next snippet would outline a pipe that may sync a desk climate from a distant PostgreSQL database (outlined beneath as sql:supply) to an area SQLite file (sql:dest on this venture).

sync:
pipes:
- connector: "sql:supply"
metric: "climate"
goal: "climate"
columns:
datetime: "timestamp"
station: "station"
parameters:
fetch:
backtrack_minutes: 1440
question: |-
SELECT timestamp, station, temperature
FROM climate

config:
meerschaum:
occasion: "sql:dest"
connectors:
sql:
supply: "postgresql://person:move@host:5432/db"
dest: "sqlite:////tmp/dest.db"

This instance would incrementally replace a desk named climate utilizing the datetime axis timestamp for vary bounding (1 day backtracking), and this column plus the ID column station collectively would make up a composite main key used for de-duplication.

The URI is written actually simply for instance; if you’re committing a compose file, both reference an atmosphere variable (e.g. $SECRET_URI) or your host Meerschaum configuration (e.g. MRSM{meerschaum:connectors:sql:supply}).

Connectors

First, a fast refresher on Meerschaum connectors: you possibly can outline connectors by means of a number of methods, the most well-liked of which being by means of environment variables. Suppose you outline your connection secrets and techniques in an atmosphere file:

export MRSM_SQL_REMOTE='postgresql://person:move@host:5432/db'
export MRSM_FOO_BAR='{
"person": "abc",
"password": "def"
}'

The primary atmosphere variable MRSM_SQL_REMOTE would outline the connector sql:distant. In case you sourced this file, you may confirm this connector with the command mrsm present connectors sql:distant.

The second variable is an instance of the right way to outline a customized FooConnector, which you may create utilizing the @make_connector decorator in a plugin. Customized connectors are a strong instrument, however for now, right here’s the essential construction:

from meerschaum.connectors import make_connector, Connector

@make_connector
class FooConnector(Connector):
REQUIRED_ATTRIBUTES = ['username', 'password']

def fetch(pipe, **kwargs):
docs = []
return docs

So we’ve simply reviewed the right way to outline connectors in our host atmosphere. Let’s see the right way to make these host connectors accessible in a Meerschaum venture. Within the compose file, all the connectors we’d like for our venture are outlined underneath config:meerschaum:connectors. Use the MRSM{} syntax to reference the keys out of your host atmosphere and move them into the venture.

config:
meerschaum:
occasion: "sql:app"
connectors:
sql:
app: MRSM{meerschaum:connectors:sql:distant}
foo:
bar: MRSM{meerschaum:connectors:foo:bar}

Plugins

Meerschaum is well extendable through plugins, that are Python modules. Plugins could fetch information, implement customized connectors, and/or prolong Meerschaum (e.g. customized actions, flags, API endpoints, and so forth.).

Meerschaum helps a number of plugins directories (through MRSM_PLUGINS_DIR), which can be set underneath the plugins_dir key in mrsm-compose.yaml (the default is a listing plugins).

Storing your plugins inside a Compose venture makes it clear the way you count on your plugins for use. For instance, the Compose file within the MongoDBConnector project demonstrates how the customized connector is used as each a connector and as an example.

Package deal Administration

Once you first begin utilizing Meerschaum Compose, the very first thing you’ll discover is that it’s going to begin putting in a good quantity of Python packages. Don’t fear about your atmosphere ― every little thing is put in into digital environments inside your venture’s root subdirectory (a bit ironic, proper?). You possibly can set up your plugins’ dependencies with mrsm compose init.

To share packages between initiatives, set the important thing root_dir in mrsm-compose.yml to a brand new path. Deleting this root listing will successfully uninstall all the packages that Compose downloaded, conserving your host atmosphere intact.


6 Use Circumstances in Python The place * and ** Are available in Helpful

Imperfections Unveiled: The Intriguing Actuality Behind Our MLOps Course Creation