pytorch_lightning.trainer.distrib_parts module¶

Root module for all distributed operations in Lightning. Currently supports training on CPU, GPU (dp, ddp, ddp2, horovod) and TPU.

class pytorch_lightning.trainer.distrib_parts.TrainerDPMixin[source]¶

Bases: abc.ABC

_TrainerDPMixin__transfer_batch_to_device(batch, device)[source]¶

copy_trainer_model_properties(model)[source]¶

dp_train(model)[source]¶

abstract get_model()[source]¶

Warning: this is just empty shell for code implemented in other class.

Return type: LightningModule

horovod_train(model)[source]¶

abstract init_optimizers(*args)[source]¶

Warning: this is just empty shell for code implemented in other class.

Return type: Tuple[List, List, List]

abstract is_function_implemented(*args)[source]¶

Warning: this is just empty shell for code implemented in other class.

Return type: bool

abstract reinit_scheduler_properties(*args)[source]¶: Warning: this is just empty shell for code implemented in other class.

abstract run_pretrain_routine(*args)[source]¶: Warning: this is just empty shell for code implemented in other class.

abstract setup(*args)[source]¶

Warning: this is just empty shell for code implemented in other class.

Return type: None

single_gpu_train(model)[source]¶

tpu_train(tpu_core_idx, model)[source]¶

transfer_batch_to_gpu(batch, gpu_id=None)[source]¶

Transfers the data to the GPU.

Parameters

batch¶ (Any) – A tensor or collection of tensors.
gpu_id¶ (Optional[int]) – The id of the GPU device. If omitted, the first available GPU is chosen.

Returns

the tensor on the GPU device.

See also

move_data_to_device()

amp_level: str = None[source]¶

data_parallel_device_ids: ... = None[source]¶

global_rank: int = None[source]¶

logger: ... = None[source]¶

on_colab_kaggle: str = None[source]¶

on_gpu: bool = None[source]¶

precision: ... = None[source]¶

progress_bar_callback: ... = None[source]¶

root_gpu: ... = None[source]¶

save_spawn_weights: Callable = None[source]¶

single_gpu: bool = None[source]¶

testing: bool = None[source]¶

tpu_global_core_rank: int = None[source]¶

tpu_id: Optional[int] = None[source]¶

tpu_local_core_rank: int = None[source]¶

abstract property use_amp[source]¶

this is just empty shell for code implemented in other class.

Type: Warning
Return type: bool

use_ddp: bool = None[source]¶

use_ddp2: bool = None[source]¶

use_dp: bool = None[source]¶

use_tpu: bool = None[source]¶

pytorch_lightning.trainer.distrib_parts._check_data_type(device_ids)[source]¶

Checks that the device_ids argument is one of: None, Int, String or List. Raises a MisconfigurationException otherwise.

Parameters: device_ids¶ (Any) – gpus/tpu_cores parameter as passed to the Trainer
Return type: None

pytorch_lightning.trainer.distrib_parts._normalize_parse_gpu_input_to_list(gpus)[source]¶

Return type: Optional[List[int]]

pytorch_lightning.trainer.distrib_parts._normalize_parse_gpu_string_input(s)[source]¶

Return type: Union[int, List[int]]

pytorch_lightning.trainer.distrib_parts._parse_gpu_ids(gpus)[source]¶

Parses the GPU ids given in the format as accepted by the Trainer.

Parameters: gpus¶ (Union[int, str, List[int], None]) – An int -1 or string ‘-1’ indicate that all available GPUs should be used. A list of ints or a string containing list of comma separated integers indicates specific GPUs to use. An int 0 means that no GPUs should be used. Any int N > 0 indicates that GPUs [0..N) should be used.
Return type: Optional[List[int]]
Returns: a list of gpus to be used or None if no GPUs were requested

If no GPUs are available but the value of gpus variable indicates request for GPUs then a MisconfigurationException is raised.

pytorch_lightning.trainer.distrib_parts._parse_tpu_cores(tpu_cores)[source]¶

Parses the tpu_cores given in the format as accepted by the Trainer.

Parameters: tpu_cores¶ (Union[int, str, List]) – An int 1 or string ‘1’ indicate that 1 core with multi-processing should be used An int 8 or string ‘8’ indicate that all 8 cores with multi-processing should be used A list of int or a string containing list of comma separated integer indicates specific TPU core to use.
Return type: Union[List[int], int, None]
Returns: a list of tpu_cores to be used or None if no TPU cores were requested

pytorch_lightning.trainer.distrib_parts._parse_tpu_cores_str(tpu_cores)[source]¶

pytorch_lightning.trainer.distrib_parts._tpu_cores_valid(tpu_cores)[source]¶

pytorch_lightning.trainer.distrib_parts.determine_root_gpu_device(gpus)[source]¶

Parameters: gpus¶ (List[int]) – non-empty list of ints representing which gpus to use
Return type: Optional[int]
Returns: designated root GPU device id

pytorch_lightning.trainer.distrib_parts.get_all_available_gpus()[source]¶

Return type: List[int]
Returns: a list of all available gpus

pytorch_lightning.trainer.distrib_parts.pick_multiple_gpus(nb)[source]¶

pytorch_lightning.trainer.distrib_parts.pick_single_gpu(exclude_gpus)[source]¶

pytorch_lightning.trainer.distrib_parts.retry_jittered_backoff(func, num_retries=5, cap_delay=1.0, base_delay=0.01)[source]¶

Retry jittered backoff.

Based on: https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/

Parameters

func¶ (Callable) – tested function
num_retries¶ (int) – number of tries
cap_delay¶ (float) – max sleep time
base_delay¶ (float) – initial sleep time is 10ms

pytorch_lightning.trainer.distrib_parts.sanitize_gpu_ids(gpus)[source]¶

Checks that each of the GPUs in the list is actually available. Raises a MisconfigurationException if any of the GPUs is not available.

Parameters: gpus¶ (List[int]) – list of ints corresponding to GPU indices
Return type: List[int]
Returns: unmodified gpus variable