Shortcuts

ParallelPlugin

class pytorch_lightning.plugins.training_type.ParallelPlugin(parallel_devices=None, cluster_environment=None)[source]

Bases: pytorch_lightning.plugins.training_type.training_type_plugin.TrainingTypePlugin, abc.ABC

Plugin for training with multiple processes in parallel.

all_gather(tensor, group=None, sync_grads=False)[source]

Perform a all_gather on all processes

Return type

Tensor

block_backward_sync()[source]

Blocks ddp sync gradients behaviour on backwards pass. This is useful for skipping sync when accumulating gradients, reducing communication overhead Returns: context manager with sync behaviour off

static configure_sync_batchnorm(model)[source]

Add global batchnorm for a model spread across multiple GPUs and nodes.

Override to synchronize batchnorm between specific process groups instead of the whole world or use a different sync_bn like apex’s version.

Parameters

model (LightningModule) – pointer to current LightningModule.

Return type

LightningModule

Returns

LightningModule with batchnorm layers synchronized between process groups

reconciliate_processes(trace)[source]

Function to re-conciliate processes on failure

reduce_boolean_decision(decision)[source]

Reduce the early stopping decision across all processes

Return type

bool

teardown()[source]

This method is called to teardown the training process. It is the right place to release memory and free other resources.

Return type

None

property is_global_zero: bool

Whether the current process is the rank zero process not only on the local node, but for all nodes.

Return type

bool

property lightning_module

Returns the pure LightningModule without potential wrappers

property on_gpu: bool

Returns whether the current process is done on GPU

Return type

bool

property on_tpu: bool

Returns whether the current process is done on TPU

Return type

bool

abstract property root_device: torch.device

Returns the root device

Return type

device