ParallelPlugin¶

class pytorch_lightning.plugins.training_type.ParallelPlugin(parallel_devices=None, cluster_environment=None)[source]¶

Bases: pytorch_lightning.plugins.training_type.training_type_plugin.TrainingTypePlugin, abc.ABC

Plugin for training with multiple processes in parallel.

all_gather(tensor, group=None, sync_grads=False)[source]¶

Perform a all_gather on all processes

block_backward_sync()[source]¶: Blocks ddp sync gradients behaviour on backwards pass. This is useful for skipping sync when accumulating gradients, reducing communication overhead Returns: context manager with sync behaviour off

static configure_sync_batchnorm(model)[source]¶

Add global batchnorm for a model spread across multiple GPUs and nodes.

Override to synchronize batchnorm between specific process groups instead of the whole world or use a different sync_bn like apex’s version.

Parameters: model¶ (LightningModule) – pointer to current LightningModule.
Return type: LightningModule
Returns: LightningModule with batchnorm layers synchronized between process groups

reconciliate_processes(trace)[source]¶: Function to re-conciliate processes on failure

reduce_boolean_decision(decision)[source]¶

Reduce the early stopping decision across all processes

property is_global_zero¶

Whether the current process is the rank zero process not only on the local node, but for all nodes.

property lightning_module¶: Returns the pure LightningModule without potential wrappers