RPCSequentialPlugin¶
-
class
pytorch_lightning.plugins.training_type.
RPCSequentialPlugin
(balance=None, microbatches=8, checkpoint='except_last', balance_mode='balance_by_size', pipelined_backward=True, rpc_timeout_sec=torch.distributed.rpc.constants.DEFAULT_RPC_TIMEOUT_SEC, **kwargs)[source]¶ Bases:
pytorch_lightning.plugins.training_type.rpc.RPCPlugin
Provides sequential model parallelism for
nn.Sequential
module. If the module requires lots of memory, Pipe can be used to reduce this by leveraging multiple GPUs.Pipeline parallelism comes with with checkpointing to reduce peak memory required to train while minimizing device under-utilization. This is turned on by default and can be turned off via the checkpoint argument.
You should determine the balance when defining the plugin, or you can pass an example input array via the LightningModule to infer a balance. The module will be partitioned into multiple devices according to the given balance. You may also rely on your own heuristics to find your own optimal configuration.
- Parameters
balance¶ (
Optional
[List
[int
]]) – The balance of the model, i.e [2, 2] (two layers on each GPU).not provided assumes user provides an input example array to find a balance on all GPUs.¶ (If) –
microbatches¶ (
int
) – Allows for parallelization to reduce device utilizationsplitting the batch into further smaller batches.¶ (by) –
checkpoint¶ (
str
) – Enables gradient checkpointing. [‘always’, ‘except_last’, ‘never’]Type of balance heuristic to use if balance to be inferred.
’balance_by_size’: checks memory usage of each layer and determines balance
’balance_by_time’: checks time of each layer and determines balance
pipelined_backward¶ (
Optional
[bool
]) – if True, call torch.autograd.backward once per microbatch on thepass¶ (backward) –
a potential deadlock in pytorch when using tensor parallelism¶ (around) –
Defaults to True if¶ (at) –
> 1¶ (get_model_parallel_world_size()) –
-
post_optimizer_step
(optimizer, optimizer_idx, **kwargs)[source]¶ Hook to do something after each optimizer step.
- Return type
-
pre_backward
(closure_loss, should_accumulate, optimizer, opt_idx)[source]¶ Run before precision plugin executes backward