Managing Deep Studying Fashions Simply With TOML Configurations | by Shubham Panchal | Jun, 2023

You might by no means want these lengthy CLI args on your

Photograph by Scott Graham on Unsplash

Managing deep studying fashions might be troublesome because of the big variety of parameters and settings which can be wanted for all modules. The coaching module would possibly want parameters like batch_size or the num_epochs or parameters for the training charge scheduler. Equally, the info preprocessing module would possibly want train_test_split or parameters for picture augmentation.

A naive strategy to handle or introduce these parameters into pipeline is to make use of them as CLI arguments whereas operating the scripts. Command line arguments could possibly be troublesome to enter and managing all parameters in a single file might not be attainable. TOML recordsdata present a cleaner strategy to handle configurations and scripts can load mandatory components of the configuration within the type of a Python dict while not having boilerplate code to learn/parse command-line args.

On this weblog, we’ll discover the usage of TOML in configuration recordsdata and the way we will effectively use them throughout coaching/deployment scripts.

TOML, stands for Tom’s Obvious Minimal Language, is file-format designed particularly for configuration recordsdata. The idea of a TOML file is kind of much like YAML/YML files which have the power to retailer key-value pairs in a tree-like hierarchy. An advantage of TOML over YAML is its readability which turns into essential when there are a number of nested ranges.

Fig.1. The identical mannequin configurations written in TOML (left) and YAML (proper). TOML permits us to write down key-value pairs on the similar indentation stage whatever the hierarchy.

Personally, apart from enhanced readability, I discover no sensible cause to favor TOML over YAML. Utilizing YAML is completely nice, right here a Python package for parsing YAML.

There are two benefits of utilizing TOML for storing mannequin/information/deployment configuration for ML fashions:

Managing all configurations in a single file: With TOML recordsdata, we will create a number of teams of settings which can be required for various modules. As an illustration, in determine 1, the settings associated to the mannequin’s coaching process are nested below the [train] attribute, equally the port and host required for deploying the mannequin are saved below deploy . We’d like not bounce between or to vary their parameters, as a substitute we will globalize all settings from a single TOML configuration file.

This could possibly be tremendous useful if we’re coaching the mannequin on a digital machine, the place code-editors or IDEs aren’t accessible for enhancing recordsdata. A single config file is straightforward to edit with vim or nano accessible on most VMs.

To learn the configuration from a TOML recordsdata, two Python packages can be utilized, toml and munch . toml will assist us learn the TOML file and return the contents of the file as a Python dict . munch will convert the contents of the dict to allow attribute-style entry of components. As an illustration, as a substitute of writing, config[ "training" ][ "num_epochs" ] , we will simply write config.coaching.num_epochs which reinforces readability.

Take into account the next file construction,

- project_config.toml

project_config.toml incorporates the configuration for our ML venture, like,

vocab_size = 5589
seq_length = 10
test_split = 0.3
data_path = "dataset/"
data_tensors_path = "data_tensors/"

embedding_dim = 256
num_blocks = 5
num_heads_in_block = 3

num_epochs = 10
batch_size = 32
learning_rate = 0.001
checkpoint_path = "auto"

In , we create a perform which returns the munchified-version of this configuration, utilizing toml and munch ,

$> pip set up toml munch
import toml
import munch

def load_global_config( filepath : str = "project_config.toml" ):
return munch.munchify( toml.load( filepath ) )

def save_global_config( new_config , filepath : str = "project_config.toml" ):
with open( filepath , "w" ) as file:
toml.dump( new_config , file )

Now, now in any of our venture recordsdata, like or , we will load this configuration,

from config import load_global_config

config = load_global_config()

batch_size = config.practice.batch_size
lr = config.practice.learning_rate

if config.practice.checkpoint_path == "auto":
# Make a listing with title as present timestamp

The output of print( toml.load( filepath ) ) ) is,

{'information': {'data_path': 'dataset/',
'data_tensors_path': 'data_tensors/',
'seq_length': 10,
'test_split': 0.3,
'vocab_size': 5589},
'mannequin': {'embedding_dim': 256, 'num_blocks': 5, 'num_heads_in_block': 3},
'practice': {'batch_size': 32,
'checkpoint_path': 'auto',
'learning_rate': 0.001,
'num_epochs': 10}}

For those who’re utilizing MLOps instruments like W&B Monitoring or MLFlow, sustaining configuration as a dict could possibly be useful as we will instantly move it as an argument.

Hope you’ll think about using TOML configurations in your subsequent ML venture! Its a clear manner of managing settings which can be each international or native to your coaching / deployment or inference scripts.

As an alternative of writing lengthy CLI arguments, the scripts may instantly load the configuration from the TOML file. If we want to practice two variations of a mannequin with totally different hyperparameters, we simply want to vary the TOML file in . I’ve began utilizing TOML recordsdata in my current tasks and experimentation has turn into sooner. MLOps instruments also can handle variations of a mannequin together with their configurations, however the simplicity of the above mentioned strategy is exclusive and required minimal change in current tasks.

Hope you’ve loved studying. Have a pleasant day forward!

Hey GPU, What’s Up with My Matrix? | by Thushan Ganegedara | Jun, 2023

Ahead and Backward Mapping for Pc Imaginative and prescient | by Javier Martínez Ojeda | Could, 2023