ParallelPlugin¶

class pytorch_lightning.plugins.training_type.ParallelPlugin(parallel_devices=None, cluster_environment=None)[source]¶

Plugin for training with multiple processes in parallel.

all_gather(tensor, group=None, sync_grads=False)[source]¶

Perform a all_gather on all processes

block_backward_sync()[source]¶: Blocks ddp sync gradients behaviour on backwards pass. This is useful for skipping sync when accumulating gradients, reducing communication overhead Returns: context manager with sync behaviour off

static configure_sync_batchnorm(model)[source]¶

Add global batchnorm for a model spread across multiple GPUs and nodes.

Override to synchronize batchnorm between specific process groups instead of the whole world or use a different sync_bn like apex’s version.

Parameters: model¶ (LightningModule) – pointer to current LightningModule.
Return type: LightningModule
Returns: LightningModule with batchnorm layers synchronized between process groups

reconciliate_processes(trace)[source]¶: Function to re-conciliate processes on failure

reduce_boolean_decision(decision)[source]¶

Reduce the early stopping decision across all processes

This method is called to teardown the training process. It is the right place to release memory and free other resources.

property is_global_zero: bool¶

Whether the current process is the rank zero process not only on the local node, but for all nodes.

property lightning_module¶: Returns the pure LightningModule without potential wrappers

property on_gpu: bool¶

Returns whether the current process is done on GPU

property on_tpu: bool¶

Returns whether the current process is done on TPU

abstract property root_device: torch.device¶

Returns the root device