Shortcuts

Plugins

Plugins allow custom integrations to the internals of the Trainer such as a custom precision or distributed implementation.

Under the hood, the Lightning Trainer is using plugins in the training routine, added automatically depending on the provided Trainer arguments. For example:

# accelerator: GPUAccelerator
# training strategy: DDPStrategy
# precision: NativeMixedPrecisionPlugin
trainer = Trainer(gpus=4, precision=16)

We expose Accelerators and Plugins mainly for expert users that want to extend Lightning for:

  • New hardware (like TPU plugin)

  • Distributed backends (e.g. a backend not yet supported by PyTorch itself)

  • Clusters (e.g. customized access to the cluster’s environment interface)

There are two types of Plugins in Lightning with different responsibilities:

Strategy

  • Launching and teardown of training processes (if applicable)

  • Setup communication between processes (NCCL, GLOO, MPI, …)

  • Provide a unified communication interface for reduction, broadcast, etc.

  • Provide access to the wrapped LightningModule

Futhermore, for multi-node training Lightning provides cluster environment plugins that allow the advanced user to configure Lightning to integrate with a 4. Custom cluster.

../_images/overview.svg

Create a custom plugin

Expert users may choose to extend an existing plugin by overriding its methods …

from pytorch_lightning.strategies import DDPStrategy


class CustomDDPStrategy(DDPStrategy):
    def configure_ddp(self):
        self._model = MyCustomDistributedDataParallel(
            self.model,
            device_ids=...,
        )

or by subclassing the base classes Strategy or PrecisionPlugin to create new ones. These custom plugins can then be passed into the Trainer directly or via a (custom) accelerator:

# custom plugins
trainer = Trainer(strategy=CustomDDPStrategy(), plugins=[CustomPrecisionPlugin()])

# fully custom accelerator and plugins
accelerator = MyAccelerator()
precision_plugin = MyPrecisionPlugin()
training_type_plugin = CustomDDPStrategy(accelerator=accelerator, precision_plugin=precision_plugin)
trainer = Trainer(strategy=training_type_plugin)

The full list of built-in plugins is listed below.

Warning

The Plugin API is in beta and subject to change. For help setting up custom plugins/accelerators, please reach out to us at support@pytorchlightning.ai


Training Strategies

Strategy

Base class for all training type plugins that change the behaviour of the training, validation and test- loop.

SingleDeviceStrategy

Strategy that handles communication on a single device.

ParallelStrategy

Plugin for training with multiple processes in parallel.

DataParallelStrategy

Implements data-parallel training in a single process, i.e., the model gets replicated to each device and each gets a split of the data.

DDPStrategy

Plugin for multi-process single-device training on one or multiple nodes.

DDP2Strategy

DDP2 behaves like DP in one node, but synchronization across nodes behaves like in DDP.

DDPShardedStrategy

Optimizer and gradient sharded training provided by FairScale.

DDPSpawnShardedStrategy

Optimizer sharded training provided by FairScale.

DDPSpawnStrategy

Spawns processes using the torch.multiprocessing.spawn() method and joins processes after training finishes.

DeepSpeedStrategy

Provides capabilities to run training using the DeepSpeed library, with training optimizations for large billion parameter models.

HorovodStrategy

Plugin for Horovod distributed training integration.

SingleTPUStrategy

Strategy for training on a single TPU device.

TPUSpawnStrategy

Strategy for training multiple TPU devices using the torch.multiprocessing.spawn() method.

Precision Plugins

PrecisionPlugin

Base class for all plugins handling the precision-specific parts of the training.

MixedPrecisionPlugin

Base Class for mixed precision.

NativeMixedPrecisionPlugin

Plugin for Native Mixed Precision (AMP) training with torch.autocast.

ShardedNativeMixedPrecisionPlugin

Native AMP for Sharded Training.

ApexMixedPrecisionPlugin

Mixed Precision Plugin based on Nvidia/Apex (https://github.com/NVIDIA/apex)

DeepSpeedPrecisionPlugin

Precision plugin for DeepSpeed integration.

TPUPrecisionPlugin

Precision plugin for TPU integration.

TPUBf16PrecisionPlugin

Plugin that enables bfloats on TPUs.

DoublePrecisionPlugin

Plugin for training with double (torch.float64) precision.

FullyShardedNativeMixedPrecisionPlugin

Native AMP for Fully Sharded Training.

IPUPrecisionPlugin

Precision plugin for IPU integration.

Cluster Environments

ClusterEnvironment

Specification of a cluster environment.

LightningEnvironment

The default environment used by Lightning for a single node or free cluster (not managed).

LSFEnvironment

An environment for running on clusters managed by the LSF resource manager.

TorchElasticEnvironment

Environment for fault-tolerant and elastic training with torchelastic

KubeflowEnvironment

Environment for distributed training using the PyTorchJob operator from Kubeflow

SLURMEnvironment

Cluster environment for training on a cluster managed by SLURM.

Read the Docs v: latest
Versions
latest
stable
1.5.9
1.5.8
1.5.7
1.5.6
1.5.5
1.5.4
1.5.3
1.5.2
1.5.1
1.5.0
1.4.9
1.4.8
1.4.7
1.4.6
1.4.5
1.4.4
1.4.3
1.4.2
1.4.1
1.4.0
1.3.8
1.3.7
1.3.6
1.3.5
1.3.4
1.3.3
1.3.2
1.3.1
1.3.0
1.2.10
1.2.8
1.2.7
1.2.6
1.2.5
1.2.4
1.2.3
1.2.2
1.2.1
1.2.0
1.1.8
1.1.7
1.1.6
1.1.5
1.1.4
1.1.3
1.1.2
1.1.1
1.1.0
1.0.8
1.0.7
1.0.6
1.0.5
1.0.4
1.0.3
1.0.2
1.0.1
1.0.0
0.10.0
0.9.0
0.8.5
0.8.4
0.8.3
0.8.2
0.8.1
0.8.0
0.7.6
0.7.5
0.7.4
0.7.3
0.7.2
0.7.1
0.7.0
0.6.0
0.5.3
0.4.9
ipynb-update
docs-search
Downloads
html
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.