- class pytorch_lightning.strategies.BaguaStrategy(algorithm='gradient_allreduce', flatten=True, accelerator=None, parallel_devices=None, cluster_environment=None, checkpoint_io=None, precision_plugin=None, **bagua_kwargs)¶
Strategy for training using the Bagua library, with advanced distributed training algorithms and system optimizations.
This strategy requires the bagua package to be installed. See installation guide for more information.
BaguaStrategyis only supported on GPU and on Linux systems.
str) – Distributed algorithm used to do the actual communication and update. Built-in algorithms include “gradient_allreduce”, “bytegrad”, “decentralized”, “low_precision_decentralized”, “qadam” and “async”.
Any]]) – Additional keyword arguments that will be passed to initialize the Bagua algorithm. More details on keyword arguments accepted for each algorithm can be found in the documentation.
- barrier(*args, **kwargs)¶
Synchronizes all processes which blocks processes until the whole group enters this function.
- broadcast(obj, src=0)¶
Broadcasts an object to all processes.
- reduce(tensor, group=None, reduce_op='mean')¶
Reduces a tensor from several distributed processes to one aggregated tensor.
- Return type
The reduced value, except when the input was not a tensor the output remains is unchanged.
Setup plugins for the trainer fit and creates optimizers.
This method is called to teardown the training process.
It is the right place to release memory and free other resources.
- Return type