Shortcuts

IPUStrategy

class lightning.pytorch.strategies.IPUStrategy(accelerator=None, device_iterations=1, autoreport=False, autoreport_dir=None, parallel_devices=None, cluster_environment=None, checkpoint_io=None, precision_plugin=None, training_opts=None, inference_opts=None)[source]

Bases: lightning.pytorch.strategies.parallel.ParallelStrategy

Plugin for training on IPU devices.

Warning

This is an experimental feature.

Parameters
all_gather(tensor, group=None, sync_grads=False)[source]

Perform a all_gather on all processes.

Return type

Tensor

barrier(name=None)[source]

Synchronizes all processes which blocks processes until the whole group enters this function.

Parameters

name (Optional[str]) – an optional name to pass into barrier.

Return type

None

batch_to_device(batch, device=None, dataloader_idx=0)[source]

Moves the batch to the correct device.

The returned batch is of the same type as the input batch, just having all tensors on the correct device.

Parameters
  • batch (Any) – The batch of samples to move to the correct device

  • device (Optional[device]) – The target device

  • dataloader_idx (int) – The index of the dataloader to which the batch belongs.

Return type

Any

broadcast(obj, src=0)[source]

Broadcasts an object to all processes.

Parameters
  • obj (TypeVar(TBroadcast)) – the object to broadcast

  • src (int) – source rank

Return type

TypeVar(TBroadcast)

model_to_device()[source]

Moves the model to the correct device.

Return type

None

on_predict_end()[source]

Called when predict ends.

Return type

None

on_predict_start()[source]

Called when predict begins.

Return type

None

on_test_end()[source]

Called when test end.

Return type

None

on_test_start()[source]

Called when test begins.

Return type

None

on_train_batch_start(batch, batch_idx)[source]

Called in the training loop before anything happens for that batch.

Return type

None

on_train_end()[source]

Called when train ends.

Return type

None

on_train_start()[source]

Called when train begins.

Return type

None

on_validation_end()[source]

Called when validation ends.

Return type

None

on_validation_start()[source]

Called when validation begins.

Return type

None

predict_step(*args, **kwargs)[source]

The actual predict step.

See predict_step() for more details

Return type

Union[Tensor, Dict[str, Any]]

reduce(tensor, *args, **kwargs)[source]

Reduces the given tensor (e.g. across GPUs/processes).

Parameters
  • tensor (Union[Tensor, Any]) – the tensor to sync and reduce

  • group – the process group to reduce

  • reduce_op – the reduction operation. Defaults to ‘mean’. Can also be a string ‘sum’ or ReduceOp.

Return type

Union[Tensor, Any]

setup(trainer)[source]

Setup plugins for the trainer fit and creates optimizers.

Parameters

trainer (Trainer) – the trainer instance

Return type

None

setup_optimizers(trainer)[source]

Creates optimizers and schedulers.

Parameters

trainer (Trainer) – the Trainer, these optimizers should be connected to

Return type

None

teardown()[source]

This method is called to teardown the training process.

It is the right place to release memory and free other resources.

Return type

None

test_step(*args, **kwargs)[source]

The actual test step.

See test_step() for more details

Return type

Union[Tensor, Dict[str, Any], None]

training_step(*args, **kwargs)[source]

The actual training step.

See training_step() for more details

Return type

Union[Tensor, Dict[str, Any]]

validation_step(*args, **kwargs)[source]

The actual validation step.

See validation_step() for more details

Return type

Union[Tensor, Dict[str, Any], None]

property is_global_zero: bool

Whether the current process is the rank zero process not only on the local node, but for all nodes.

Return type

bool

property root_device: torch.device

Return the root device.

Return type

device