HorovodPlugin¶

class pytorch_lightning.plugins.training_type.HorovodPlugin(parallel_devices=None)[source]¶

Plugin for Horovod distributed training integration.

all_gather(result, group=None, sync_grads=False)[source]¶

Perform a all_gather on all processes

barrier(*args, **kwargs)[source]¶: Forces all possibly joined processes to wait for each other

broadcast(obj, src=0)[source]¶

Broadcasts an object to all processes

post_backward(closure_loss)[source]¶

Run after precision plugin executes backward

pre_dispatch()[source]¶: Hook to do something before the training/evaluation/prediction starts.

reduce(tensor, group=None, reduce_op='mean')[source]¶

Reduces a tensor from several distributed processes to one aggregated tensor.

Parameters

tensor¶ – the tensor to sync and reduce
group¶ (Optional[Any]) – the process group to gather results from. Defaults to all processes (world)
reduce_op¶ (Union[ReduceOp, str, None]) – the reduction operation. Defaults to ‘mean’/’avg’. Can also be a string ‘sum’ to calculate the sum during reduction.

Returns

reduced value, except when the input was not a tensor the output remains is unchanged