FSDPPrecision

class lightning.pytorch.plugins.precision.FSDPPrecision(precision, scaler=None)[source]

Bases: Precision

Precision plugin for training with Fully Sharded Data Parallel (FSDP).

Warning

This is an experimental feature.

Parameters:
  • precision (Literal['32-true', '16-true', 'bf16-true', '16-mixed', 'bf16-mixed']) – Full precision (32-true), half precision (16-true, bf16-true) or mixed precision (16-mixed, bf16-mixed).

  • scaler (Optional[ShardedGradScaler]) – An optional torch.distributed.fsdp.sharded_grad_scaler.ShardedGradScaler to use.

Raises:

ValueError – If unsupported precision is provided.

clip_grad_by_norm(*_, **__)[source]

Clip gradients by norm.

Return type:

None

convert_input(data)[source]

Convert model inputs (forward) to the floating point precision type of this plugin.

This is a no-op in the base precision plugin, since we assume the data already has the desired type (default is torch.float32).

Return type:

Any

convert_output(data)[source]

Convert outputs to the floating point precision type expected after model’s forward.

This is a no-op in the base precision plugin, since we assume the data already has the desired type (default is torch.float32).

Return type:

Any

forward_context()[source]

A contextmanager for managing model forward/training_step/evaluation_step/predict_step.

Return type:

ContextManager

load_state_dict(state_dict)[source]

Called when loading a checkpoint, implement to reload precision plugin state given precision plugin state_dict.

Parameters:

state_dict (Dict[str, Any]) – the precision plugin state returned by state_dict.

Return type:

None

module_init_context()[source]

Instantiate module parameters or tensors in the precision type this plugin handles.

This is optional and depends on the precision limitations during optimization.

Return type:

ContextManager

optimizer_step(optimizer, model, closure, **kwargs)[source]

Hook to run the optimizer step.

Return type:

Any

pre_backward(tensor, module)[source]

Runs before precision plugin executes backward.

Parameters:
  • tensor (Tensor) – The tensor that will be used for backpropagation

  • module (LightningModule) – The module that was involved in producing the tensor and whose parameters need the gradients

Return type:

Tensor

state_dict()[source]

Called when saving a checkpoint, implement to generate precision plugin state_dict.

Return type:

Dict[str, Any]

Returns:

A dictionary containing precision plugin state.

tensor_init_context()[source]

Controls how tensors get created (device, dtype).

Return type:

ContextManager