Plugin for Horovod distributed training integration.
all_gather(result, group=None, sync_grads=False)¶
Perform a all_gather on all processes
- Return type
Forces all possibly joined processes to wait for each other
Moves the model to the correct device
post_backward(closure_loss, should_accumulate, optimizer, opt_idx)¶
Run after precision plugin executes backward
Hook to do something before the training/evaluation/prediction starts.
reduce(tensor, group=None, reduce_op='mean')¶
Reduces a tensor from several distributed processes to one aggregated tensor.
reduced value, except when the input was not a tensor the output remains is unchanged
Called by the accelerator to finish setup.
Returns the root device