RPCSequentialPlugin(balance=None, microbatches=8, checkpoint='except_last', balance_mode='balance_by_size', pipelined_backward=True, rpc_timeout_sec=torch.distributed.rpc.constants.DEFAULT_RPC_TIMEOUT_SEC, **kwargs)¶
Provides sequential model parallelism for
nn.Sequentialmodule. If the module requires lots of memory, Pipe can be used to reduce this by leveraging multiple GPUs.
Pipeline parallelism comes with with checkpointing to reduce peak memory required to train while minimizing device under-utilization. This is turned on by default and can be turned off via the checkpoint argument.
You should determine the balance when defining the plugin, or you can pass an example input array via the LightningModule to infer a balance. The module will be partitioned into multiple devices according to the given balance. You may also rely on your own heuristics to find your own optimal configuration.
not provided assumes user provides an input example array to find a balance on all GPUs.¶ (If) –
splitting the batch into further smaller batches.¶ (by) –
Type of balance heuristic to use if balance to be inferred.
’balance_by_size’: checks memory usage of each layer and determines balance
’balance_by_time’: checks time of each layer and determines balance
pass¶ (backward) –
a potential deadlock in pytorch when using tensor parallelism¶ (around) –
Defaults to True if¶ (at) –
> 1¶ (get_model_parallel_world_size()) –
post_optimizer_step(optimizer, optimizer_idx, **kwargs)¶
Hook to do something after each optimizer step.
- Return type
pre_backward(closure_loss, should_accumulate, optimizer, opt_idx)¶
Run before precision plugin executes backward
rpc_save_model(trainer, save_model_fn, filepath)¶
Override to save model to disk. This is required as the main process will be required to handle aggregating model states from RPC processes.