Lightning CLI and config files¶

Another source of boilerplate code that Lightning can help to reduce is in the implementation of training command line tools. Furthermore, it provides a standardized way to configure trainings using a single file that includes settings for Trainer and user extended LightningModule and LightningDataModule classes. The full configuration is automatically saved in the log directory. This has the benefit of greatly simplifying the reproducibility of experiments.

The main requirement for user extended classes to be made configurable is that all relevant init arguments must have type hints. This is not a very demanding requirement since it is good practice to do anyway. As a bonus if the arguments are described in the docstrings, then the help of the training tool will display them.

Warning

LightningCLI is in beta and subject to change.

LightningCLI¶

The implementation of training command line tools is done via the LightningCLI class. The minimal installation of pytorch-lightning does not include this support. To enable it either install lightning with the all extras require or install the package jsonargparse[signatures].

The case in which the user’s LightningModule class implements all required *_dataloader methods, a trainer.py tool can be as simple as:

from pytorch_lightning.utilities.cli import LightningCLI

cli = LightningCLI(MyModel)

The help of the tool describing all configurable options and default values can be shown by running python trainer.py --help. Default options can be changed by providing individual command line arguments. However, it is better practice to create a configuration file and provide this to the tool. A way to do this would be:

# Dump default configuration to have as reference
python trainer.py --print_config > default_config.yaml
# Create config including only options to modify
nano config.yaml
# Run training using created configuration
python trainer.py --config config.yaml

The instantiation of the LightningCLI class takes care of parsing command line and config file options, instantiating the classes, setting up a callback to save the config in the log directory and finally running trainer.fit(). The resulting object cli can be used for instance to get the result of fit, i.e., cli.fit_result.

After multiple trainings with different configurations, each run will have in its respective log directory a config.yaml file. This file can be used for reference to know in detail all the settings that were used for each particular run, and also could be used to trivially reproduce a training, e.g.:

python trainer.py --config lightning_logs/version_7/config.yaml

If a separate LightningDataModule class is required, the trainer tool just needs a small modification as follows:

from pytorch_lightning.utilities.cli import LightningCLI

cli = LightningCLI(MyModel, MyDataModule)

The start of a possible implementation of MyModel including the recommended argument descriptions in the docstring could be the one below. Note that by using type hints and docstrings there is no need to duplicate this information to define its configurable arguments.

class MyModel(LightningModule):

    def __init__(
        self,
        encoder_layers: int = 12,
        decoder_layers: List[int] = [2, 4]
    ):
        """Example encoder-decoder model

        Args:
            encoder_layers: Number of layers for the encoder
            decoder_layers: Number of layers for each decoder block
        """
        super().__init__()
        self.save_hyperparameters()

With this model class, the help of the trainer tool would look as follows:

$ python trainer.py --help
usage: trainer.py [-h] [--print_config] [--config CONFIG]
                  [--trainer.logger LOGGER]
                  ...

pytorch-lightning trainer command line tool

optional arguments:
  -h, --help            show this help message and exit
  --print_config        print configuration and exit
  --config CONFIG       Path to a configuration file in json or yaml format.
                        (default: null)

Customize every aspect of training via flags:
  ...
  --trainer.max_epochs MAX_EPOCHS
                        Stop training once this number of epochs is reached.
                        (type: int, default: 1000)
  --trainer.min_epochs MIN_EPOCHS
                        Force training for at least these many epochs (type: int,
                        default: 1)
  ...

Example encoder-decoder model:
  --model.encoder_layers ENCODER_LAYERS
                        Number of layers for the encoder (type: int, default: 12)
  --model.decoder_layers DECODER_LAYERS
                        Number of layers for each decoder block (type: List[int],
                        default: [2, 4])

The default configuration that option --print_config gives is in yaml format and for the example above would look as follows:

$ python trainer.py --print_config
model:
  decoder_layers:
  - 2
  - 4
  encoder_layers: 12
trainer:
  accelerator: null
  accumulate_grad_batches: 1
  amp_backend: native
  amp_level: O2
  ...

Note that there is a section for each class (model and trainer) including all the init parameters of the class. This grouping is also used in the formatting of the help shown previously.

Trainer Callbacks and arguments with class type¶

A very important argument of the Trainer class is the callbacks. In contrast to other more simple arguments which just require numbers or strings, callbacks expects a list of instances of subclasses of Callback. To specify this kind of argument in a config file, each callback must be given as a dictionary including a class_path entry with an import path of the class, and optionally an init_args entry with arguments required to instantiate it. Therefore, a simple configuration file example that defines a couple of callbacks is the following:

trainer:
  callbacks:
    - class_path: pytorch_lightning.callbacks.EarlyStopping
      init_args:
        patience: 5
    - class_path: pytorch_lightning.callbacks.LearningRateMonitor
      init_args:
        ...

Similar to the callbacks, any arguments in Trainer and user extended LightningModule and LightningDataModule classes that have as type hint a class can be configured the same way using class_path and init_args.

Multiple models and/or datasets¶

In the previous examples LightningCLI works only for a single model and datamodule class. However, there are many cases in which the objective is to easily be able to run many experiments for multiple models and datasets. For these cases the tool can be configured such that a model and/or a datamodule is specified by an import path and init arguments. For example, with a tool implemented as:

from pytorch_lightning.utilities.cli import LightningCLI

cli = LightningCLI(
    MyModelBaseClass,
    MyDataModuleBaseClass,
    subclass_mode_model=True,
    subclass_mode_data=True
)

A possible config file could be as follows:

model:
  class_path: mycode.mymodels.MyModel
  init_args:
    decoder_layers:
    - 2
    - 4
    encoder_layers: 12
data:
  class_path: mycode.mydatamodules.MyDataModule
  init_args:
    ...
trainer:
  callbacks:
    - class_path: pytorch_lightning.callbacks.EarlyStopping
      init_args:
        patience: 5
    ...

Only model classes that are a subclass of MyModelBaseClass would be allowed, and similarly only subclasses of MyDataModuleBaseClass. If as base classes LightningModule and LightningDataModule are given, then the tool would allow any lightning module and data module.

Tip

Note that with the subclass modes the --help option does not show information for a specific subclass. To get help for a subclass the options --model.help and --data.help can be used, followed by the desired class path. Similarly --print_config does not include the settings for a particular subclass. To include them the class path should be given before the --print_config option. Examples for both help and print config are:

$ python trainer.py --model.help mycode.mymodels.MyModel
$ python trainer.py --model mycode.mymodels.MyModel --print_config

Models with multiple submodules¶

Many use cases require to have several modules each with its own configurable options. One possible way to handle this with LightningCLI is to implement a single module having as init parameters each of the submodules. Since the init parameters have as type a class, then in the configuration these would be specified with class_path and init_args entries. For instance a model could be implemented as:

class MyMainModel(LightningModule):

    def __init__(
        self,
        encoder: EncoderBaseClass,
        decoder: DecoderBaseClass
    ):
        """Example encoder-decoder submodules model

        Args:
            encoder: Instance of a module for encoding
            decoder: Instance of a module for decoding
        """
        super().__init__()
        self.encoder = encoder
        self.decoder = decoder

If the CLI is implemented as LightningCLI(MyMainModel) the configuration would be as follows:

model:
  encoder:
    class_path: mycode.myencoders.MyEncoder
    init_args:
      ...
  decoder:
    class_path: mycode.mydecoders.MyDecoder
    init_args:
      ...

It is also possible to combine subclass_mode_model=True and submodules, thereby having two levels of class_path.

Customizing LightningCLI¶

The init parameters of the LightningCLI class can be used to customize some things, namely: the description of the tool, enabling parsing of environment variables and additional arguments to instantiate the trainer and configuration parser.

Nevertheless the init arguments are not enough for many use cases. For this reason the class is designed so that can be extended to customize different parts of the command line tool. The argument parser class used by LightningCLI is LightningArgumentParser which is an extension of python’s argparse, thus adding arguments can be done using the add_argument() method. In contrast to argparse it has additional methods to add arguments, for example add_class_arguments() adds all arguments from the init of a class, though requiring parameters to have type hints. For more details about this please refer to the respective documentation.

The LightningCLI class has the add_arguments_to_parser() method which can be implemented to include more arguments. After parsing, the configuration is stored in the config attribute of the class instance. The LightningCLI class also has two methods that can be used to run code before and after trainer.fit is executed: before_fit() and after_fit(). A realistic example for these would be to send an email before and after the execution of fit. The code would be something like:

from pytorch_lightning.utilities.cli import LightningCLI

class MyLightningCLI(LightningCLI):

    def add_arguments_to_parser(self, parser):
        parser.add_argument('--notification_email', default='will@email.com')

    def before_fit(self):
        send_email(
            address=self.config['notification_email'],
            message='trainer.fit starting'
        )

    def after_fit(self):
        send_email(
            address=self.config['notification_email'],
            message='trainer.fit finished'
        )

cli = MyLightningCLI(MyModel)

Note that the config object self.config is a dictionary whose keys are global options or groups of options. It has the same structure as the yaml format described previously. This means for instance that the parameters used for instantiating the trainer class can be found in self.config['trainer'].

Another case in which it might be desired to extend LightningCLI is that the model and data module depend on a common parameter. For example in some cases both classes require to know the batch_size. It is a burden and error prone giving the same value twice in a config file. To avoid this the parser can be configured so that a value is only given once and then propagated accordingly. With a tool implemented like shown below, the batch_size only has to be provided in the data section of the config.

from pytorch_lightning.utilities.cli import LightningCLI

class MyLightningCLI(LightningCLI):

    def add_arguments_to_parser(self, parser):
        parser.link_arguments('data.batch_size', 'model.batch_size')

cli = MyLightningCLI(MyModel, MyDataModule)

The linking of arguments is observed in the help of the tool, which for this example would look like:

$ python trainer.py --help
  ...
    --data.batch_size BATCH_SIZE
                          Number of samples in a batch (type: int, default: 8)

  Linked arguments:
    model.batch_size <-- data.batch_size
                          Number of samples in a batch (type: int)

Tip

The linking of arguments can be used for more complex cases. For example to derive a value via a function that takes multiple settings as input. For more details have a look at the API of link_arguments.

Tip

Have a look at the LightningCLI class API reference to learn about other methods that can be extended to customize a CLI.