Implements data-parallel training in a single process, i.e., the model gets replicated to each device and each gets a split of the data.
Forces all possibly joined processes to wait for each other
Moves the model to the correct device
reduce(tensor, *args, **kwargs)¶
Reduces a tensor from all parallel processes to one aggregated tensor.
Reduce the early stopping decision across all processes
- Return type
Called by the accelerator to finish setup.
Returns the root device