Shortcuts

pytorch_lightning.trainer.distrib_parts module

Root module for all distributed operations in Lightning. Currently supports training on CPU, GPU (dp, ddp, ddp2, horovod) and TPU.

class pytorch_lightning.trainer.distrib_parts.TrainerDPMixin[source]

Bases: abc.ABC

_TrainerDPMixin__transfer_batch_to_device(batch, device)[source]
copy_trainer_model_properties(model)[source]
dp_train(model)[source]
abstract get_model()[source]

Warning: this is just empty shell for code implemented in other class.

Return type

LightningModule

horovod_train(model)[source]
abstract init_optimizers(*args)[source]

Warning: this is just empty shell for code implemented in other class.

Return type

Tuple[List, List, List]

abstract is_function_implemented(*args)[source]

Warning: this is just empty shell for code implemented in other class.

Return type

bool

abstract reinit_scheduler_properties(*args)[source]

Warning: this is just empty shell for code implemented in other class.

abstract run_pretrain_routine(*args)[source]

Warning: this is just empty shell for code implemented in other class.

abstract setup(*args)[source]

Warning: this is just empty shell for code implemented in other class.

Return type

None

single_gpu_train(model)[source]
tpu_train(tpu_core_idx, model)[source]
transfer_batch_to_gpu(batch, gpu_id=None)[source]

Transfers the data to the GPU.

Parameters
  • batch (Any) – A tensor or collection of tensors.

  • gpu_id (Optional[int]) – The id of the GPU device. If omitted, the first available GPU is chosen.

Returns

the tensor on the GPU device.

transfer_batch_to_tpu(batch, tpu_id=None)[source]

Transfers the data to the TPU.

Parameters
  • batch (Any) – A tensor or collection of tensors.

  • tpu_id (Optional[int]) – The id of the TPU core. If omitted, the first available core is chosen.

Returns

the tensor on the TPU device.

amp_level: str = None[source]
data_parallel_device_ids: ... = None[source]
global_rank: int = None[source]
logger: ... = None[source]
on_colab_kaggle: str = None[source]
on_gpu: bool = None[source]
precision: ... = None[source]
progress_bar_callback: ... = None[source]
root_gpu: ... = None[source]
save_spawn_weights: Callable = None[source]
single_gpu: bool = None[source]
testing: bool = None[source]
tpu_global_core_rank: int = None[source]
tpu_id: Optional[int] = None[source]
tpu_local_core_rank: int = None[source]
abstract property use_amp[source]

this is just empty shell for code implemented in other class.

Type

Warning

Return type

bool

use_ddp: bool = None[source]
use_ddp2: bool = None[source]
use_dp: bool = None[source]
use_tpu: bool = None[source]
pytorch_lightning.trainer.distrib_parts._check_data_type(device_ids)[source]

Checks that the device_ids argument is one of: None, Int, String or List. Raises a MisconfigurationException otherwise.

Parameters

device_ids (Any) – gpus/tpu_cores parameter as passed to the Trainer

Return type

None

pytorch_lightning.trainer.distrib_parts._normalize_parse_gpu_input_to_list(gpus)[source]
Return type

Optional[List[int]]

pytorch_lightning.trainer.distrib_parts._normalize_parse_gpu_string_input(s)[source]
Return type

Union[int, List[int]]

pytorch_lightning.trainer.distrib_parts._parse_gpu_ids(gpus)[source]

Parses the GPU ids given in the format as accepted by the Trainer.

Parameters

gpus (Union[int, str, List[int], None]) – An int -1 or string ‘-1’ indicate that all available GPUs should be used. A list of ints or a string containing list of comma separated integers indicates specific GPUs to use. An int 0 means that no GPUs should be used. Any int N > 0 indicates that GPUs [0..N) should be used.

Return type

Optional[List[int]]

Returns

a list of gpus to be used or None if no GPUs were requested

If no GPUs are available but the value of gpus variable indicates request for GPUs then a MisconfigurationException is raised.

pytorch_lightning.trainer.distrib_parts._parse_tpu_cores(tpu_cores)[source]

Parses the tpu_cores given in the format as accepted by the Trainer.

Parameters

tpu_cores (Union[int, str, List]) – An int 1 or string ‘1’ indicate that 1 core with multi-processing should be used An int 8 or string ‘8’ indicate that all 8 cores with multi-processing should be used A list of int or a string containing list of comma separated integer indicates specific TPU core to use.

Return type

Union[List[int], int, None]

Returns

a list of tpu_cores to be used or None if no TPU cores were requested

pytorch_lightning.trainer.distrib_parts._parse_tpu_cores_str(tpu_cores)[source]
pytorch_lightning.trainer.distrib_parts._tpu_cores_valid(tpu_cores)[source]
pytorch_lightning.trainer.distrib_parts.determine_root_gpu_device(gpus)[source]
Parameters

gpus (List[int]) – non-empty list of ints representing which gpus to use

Return type

Optional[int]

Returns

designated root GPU device id

pytorch_lightning.trainer.distrib_parts.get_all_available_gpus()[source]
Return type

List[int]

Returns

a list of all available gpus

pytorch_lightning.trainer.distrib_parts.pick_multiple_gpus(nb)[source]
pytorch_lightning.trainer.distrib_parts.pick_single_gpu(exclude_gpus)[source]
pytorch_lightning.trainer.distrib_parts.retry_jittered_backoff(func, num_retries=5, cap_delay=1.0, base_delay=0.01)[source]

Retry jittered backoff.

Based on: https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/

Parameters
  • func (Callable) – tested function

  • num_retries (int) – number of tries

  • cap_delay (float) – max sleep time

  • base_delay (float) – initial sleep time is 10ms

pytorch_lightning.trainer.distrib_parts.sanitize_gpu_ids(gpus)[source]

Checks that each of the GPUs in the list is actually available. Raises a MisconfigurationException if any of the GPUs is not available.

Parameters

gpus (List[int]) – list of ints corresponding to GPU indices

Return type

List[int]

Returns

unmodified gpus variable