HorovodPlugin¶

class pytorch_lightning.plugins.training_type.HorovodPlugin(parallel_devices=None, checkpoint_io=None)[source]¶

Plugin for Horovod distributed training integration.

all_gather(result, group=None, sync_grads=False)[source]¶

Perform a all_gather on all processes.

barrier(*args, **kwargs)[source]¶

Synchronizes all processes which blocks processes until the whole group enters this function.

broadcast(obj, src=0)[source]¶

Broadcasts an object to all processes.

Parameters

Return type

object

post_backward(closure_loss)[source]¶

Run after precision plugin executes backward.

pre_dispatch()[source]¶: Hook to do something before the training/evaluation/prediction starts.

reduce(tensor, group=None, reduce_op='mean')[source]¶

Reduces a tensor from several distributed processes to one aggregated tensor.

Parameters

tensor¶ – the tensor to sync and reduce
group¶ (Optional[Any]) – the process group to gather results from. Defaults to all processes (world)
reduce_op¶ (Union[ReduceOp, str, None]) – the reduction operation. Defaults to ‘mean’/’avg’. Can also be a string ‘sum’ to calculate the sum during reduction.

Returns

reduced value, except when the input was not a tensor the output remains is unchanged

Called by the accelerator to finish setup.

This method is called to teardown the training process.

It is the right place to release memory and free other resources.