RPCPlugin¶

class pytorch_lightning.plugins.training_type.RPCPlugin(rpc_timeout_sec=torch.distributed.rpc.constants.DEFAULT_RPC_TIMEOUT_SEC, parallel_devices=None, num_nodes=None, cluster_environment=None, sync_batchnorm=None, **kwargs)[source]¶

Bases: pytorch_lightning.plugins.training_type.ddp.DDPPlugin

Backbone for RPC Plugins built on top of DDP. RPC introduces different communication behaviour than DDP. Unlike DDP, processes potentially are not required to run the same code as the main process. This leads to edge cases where logic needs to be re-defined. This class contains special cases that need to be addressed when using RPC communication when building custom RPC Plugins.

rpc_save_model(trainer, save_model_fn, filepath)[source]¶

Override to save model to disk. This is required as the main process will be required to handle aggregating model states from RPC processes.

Parameters

trainer¶ – The trainer object.
save_model_fn¶ (Callable) – The saving function to save final model.
filepath¶ (str) – The filepath to save the model to.

Return type

None