Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog.

[unReleased] - 2024-MM-DD

[unReleased] - Added

  • Enabled consolidating distributed checkpoints through fabric consolidate in the new CLI #19560)

[unReleased] - Changed

  • Renamed lightning run model to fabric run (#19442, #19527)

  • The Fabric.rank_zero_first context manager now uses a barrier without timeout to avoid long-running tasks to be interrupted (#19448)

  • Fabric now raises an error if you forget to call fabric.backward() when it is needed by the strategy or precision selection (#19447, #19493)

  • _BackwardSyncControl can now control what to do when gradient accumulation is disabled (#19577)

[unReleased] - Deprecated

[unReleased] - Removed

[unReleased] - Fixed

  • Fixed an issue causing a TypeError when using torch.compile as a decorator (#19627)

[2.2.1] - 2024-03-04

[2.2.1] - Fixed

  • Fixed an issue with CSVLogger trying to append to file from a previous run when the version is set manually (#19446)

[2.2.0] - 2024-02-08

[2.2.0] - Added

  • Added lightning.fabric.utilities.ThroughputMonitor and lightning.fabric.utilities.Throughput to track throughput and log it (#18848)

  • Added lightning.fabric.utilities.AttributeDict for convenient dict-attribute access to represent state in script (#18943)

  • Added support for meta-device initialization and materialization of 4-bit Bitsandbytes layers (#19150)

  • Added TransformerEnginePrecision(fallback_compute_dtype=) to control the dtype of operations that don’t support fp8 (#19082)

  • Added support for clipping gradients by value with FSDP (#19236)

  • Added a utility function and CLI to consolidate FSDP sharded checkpoints into a single file (#19213)

  • Added support for re-compiling the model inside Fabric.setup() over the FSDP/DDP wrappers (#19280)

[2.2.0] - Changed

  • seed_everything() without passing in a seed no longer randomly selects a seed, and now defaults to 0 (#18846)

  • Changed the TransformerEnginePrecision(dtype=) argument to weights_dtype and made it required (#19082)

  • The columns in the metrics.csv file produced by CSVLogger are now sorted alphabetically (#19159)

[2.2.0] - Removed

  • Removed support for PyTorch 1.12 (#19300)

[2.2.0] - Fixed

  • Fixed parsing of v100s GPUs in get_available_flops (#18952)

  • Fixed issue where the precision="transformer-engine" argument would not replace layers by default (#19082)

  • Fixed the input validation logic in FSDPStrategy to accept a device_mesh (#19392)

[2.1.4] - 2024-01-31

[2.1.4] - Fixed

  • Fixed an issue preventing Fabric to run on CPU when the system’s CUDA driver is outdated or broken (#19234)

  • Fixed typo in kwarg in SpikeDetection (#19282)

[2.1.3] - 2023-12-21

[2.1.3] - Fixed

  • Avoid moving the model to device if move_to_device=False is passed (#19152)

  • Fixed broadcast at initialization in MPIEnvironment (#19074)

[2.1.2] - 2023-11-15

[2.1.2] - Fixed

  • Fixed precision default from environment (#18928)

[2.1.1] - 2023-11-06

[2.1.1] - Changed

  • Calling a method other than forward that invokes submodules is now an error when the model is wrapped (e.g., with DDP) (#18819)

[2.1.1] - Fixed

  • Fixed false-positive warnings about method calls on the Fabric-wrapped module (#18819)

  • Refined the FSDP saving logic and error messaging when path exists (#18884)

  • Fixed layer conversion under Fabric.init_module() context manager when using the BitsandbytesPrecision plugin (#18914)

[2.1.0] - 2023-10-11

[2.1.0] - Added

  • Added support for the TPU-v4 architecture (#17227)

  • Added support for XLA’s new PJRT runtime (#17352)

  • Added support for Fully Sharded Data Parallel (FSDP) training with XLA (#18126, #18424, #18430)

  • Check for invalid TPU device inputs (#17227)

  • Added XLAStrategy(sync_module_states=bool) to control whether to broadcast the parameters to all devices (#17522)

  • Added support for joint setup of model and optimizer with FSDP (#17305)

  • Added support for handling multiple parameter groups in optimizers set up with FSDP (#17305)

  • Added support for saving and loading sharded model and optimizer state with FSDPStrategy (#17323)

  • Added a warning when calling methods on _FabricModule that bypass the strategy-specific wrappers (#17424)

  • Added Fabric.init_tensor() context manager to instantiate tensors efficiently directly on device and dtype (#17488)

  • Added Fabric.init_module() context manager to instantiate large models efficiently directly on device, dtype, and with sharding support (#17462)

    • Creates the model parameters in the desired dtype (torch.float32, torch.float64, torch.float16, or torch.bfloat16) depending on the ‘true’ precision choice in Fabric(precision='32-true'|'64-true'|'16-true'|'bf16-true')

    • Handles initialization for FSDP models before wrapping and the Zero stage 3 initialization for DeepSpeed before sharding

  • Added support for empty weight initialization with Fabric.init_module(empty_init=True) for checkpoint loading (#17627)

  • Added support for meta-device initialization with Fabric.init_module(empty_init=True) in FSDP (#18122)

  • Added lightning.fabric.plugins.Precision.module_init_context() and lightning.fabric.strategies.Strategy.module_init_context() context managers to control model and tensor instantiation (#17462)

  • lightning.fabric.strategies.Strategy.tensor_init_context() context manager to instantiate tensors efficiently directly on device and dtype (#17607)

  • Run the DDP wrapper in a CUDA stream (#17334)

  • Added support for true half-precision as Fabric(precision="16-true"|"bf16-true") (#17287)

  • Added support for mixed 8-bit precision as Fabric(precision="transformer-engine") using Nvidia’s Transformer Engine (#17597)

  • Added support for linear layer quantization with Fabric(plugins=BitsandbytesPrecision()) using bitsandbytes (#18655)

  • Added error messaging for missed .launch() when it is required (#17570)

  • Added support for saving checkpoints with either full state-dict or sharded state dict via FSDPStrategy(state_dict_type="full"|"sharded") (#17526)

  • Added support for loading a full-state checkpoint file into a sharded model (#17623)

  • Added support for calling hooks on a LightningModule via Fabric.call (#17874)

  • Added the parameter Fabric.load(..., strict=True|False) to enable non-strict loading of partial checkpoint state (#17645)

  • Added the parameter Fabric.save(..., filter=...) to enable saving a partial checkpoint state (#17845)

  • Added support for loading optimizer states from a full-state checkpoint file (#17747)

  • Automatically call xla_model.mark_step() before saving checkpoints with XLA (#17882)

  • Automatically call xla_model.mark_step() after optimizer.step() with XLA (#17883)

  • Added support for all half-precision modes in FSDP precision plugin (#17807)

  • Added FSDPStrategy(activation_checkpointing_policy=...) to customize the layer policy for automatic activation checkpointing (requires torch>=2.1) (#18045)

  • Added a callback for spike-detection (#18014)

  • Added the ability to set the torch.distributed.fsdp.ShardingStrategy via string in FSDPStrategy (#18087)

  • Improved error messages when attempting to load a DeepSpeed checkpoint at an invalid path (#17795)

  • Added Fabric.load_raw() for loading raw PyTorch state dict checkpoints for model or optimizer objects (#18049)

  • Allowed accessing rank information in the main process before processes are launched when using the XLAStrategy (#18194)

  • Added automatic process cleanup to avoid zombie child processes and stalls when exceptions are raised (#18218)

  • Added validation of user input for devices and num_nodes when running with SLURM or TorchElastic (#18292)

  • Improved the error messaging and instructions when handling custom batch samplers in distributed settings (#18402)

  • Added support for saving and loading stateful objects other than modules and optimizers (#18513)

  • Enabled the default process group configuration for FSDP’s hybrid sharding (#18583)

  • Added lightning.fabric.utilities.suggested_max_num_workers to assist with setting a good value in distributed settings (#18591)

  • Added lightning.fabric.utilities.is_shared_filesystem utility function to automatically check whether the filesystem is shared between machines (#18586)

  • Removed support for PyTorch 1.11 (#18691)

  • Added support for passing the argument .load_state_dict(..., assign=True|False) on Fabric-wrapped modules in PyTorch 2.1 or newer (#18690)

[2.1.0] - Changed

  • Allow using iterable-style datasets with TPUs (#17331)

  • Increased the minimum XLA requirement to 1.13 (#17368)

  • Fabric argument validation now only raises an error if conflicting settings are set through the CLI (#17679)

  • DataLoader re-instantiation is now only performed when a distributed sampler is required (#18191)

  • Improved the formatting of emitted warnings (#18288)

  • Broadcast and reduction of tensors with XLA-based strategies now preserve the input’s device (#18275)

  • Due to lack of reliability, Fabric now only runs on one GPU instead of all GPUs in a Jupyter notebook if devices="auto" (default) (#18291)

  • Enabled launching via torchrun in a SLURM environment; the TorchElasticEnvironment now gets chosen over the SLURMEnvironment if both are detected (#18618)

  • If not set by the user, Lightning will set OMP_NUM_THREADS to num_cpus / num_processes when launching subprocesses (e.g. when DDP is used) to avoid system overload for CPU-intensive tasks (#18677)

[2.1.0] - Deprecated

  • Deprecated the DDPStrategy.is_distributed property. This strategy is distributed by definition (#17381)

  • Deprecated the SingleTPUStrategy (strategy="single_tpu") in favor of SingleDeviceXLAStrategy (strategy="single_xla") (#17383)

  • Deprecated the TPUAccelerator in favor of XLAAccelerator (#17383)

  • Deprecated the TPUPrecision in favor of XLAPrecision (#17383)

  • Deprecated the TPUBf16Precision in favor of XLABf16Precision (#17383)

[2.1.0] - Removed

  • Removed automatic sharding support with Fabric.run or using fabric.launch(fn). This only impacts FSDP and DeepSpeed strategy users. Please instantiate your module under the newly added fabric.init_module context manager (#17832)

  • Removed the unsupported checkpoint_io argument from the FSDPStrategy (#18192)

[2.1.0] - Fixed

  • Fixed issue where running on TPUs would select the wrong device index (#17227)

  • Removed the need to call .launch() when using the DP-strategy (strategy="dp") (#17931)

  • Fixed FSDP re-applying activation checkpointing when the user had manually applied it already (#18006)

  • Fixed FSDP re-wrapping the module root when the user had manually wrapped the model (#18054)

  • Fixed issue where unexpected exceptions would leave the default torch dtype modified when using true precision settings (#18500)

  • Fixed redundant input-type casting in FSDP precision (#18630)

  • Fixed an issue with find_usable_cuda_devices(0) incorrectly returning a list of devices (#18722)

  • Fixed redundant file writes in CSVLogger (#18567)

[2.0.9] - 2023-09-14

[2.0.9] - Fixed

  • Fixed an issue causing the _FabricOptimizer.state to remain outdated after loading with load_state_dict (#18488)

[2.0.8] - 2023-08-29

[2.0.8] - Changed

  • On XLA, avoid setting the global rank before processes have been launched as this will initialize the PJRT computation client in the main process (#16966)

[2.0.8] - Fixed

  • Fixed model parameters getting shared between processes when running with strategy="ddp_spawn" and accelerator="cpu"; this has a necessary memory impact, as parameters are replicated for each process now (#18238)

  • Removed false positive warning when using fabric.no_backward_sync with XLA strategies (#17761)

  • Fixed issue where Fabric would not initialize the global rank, world size, and rank-zero-only rank after initialization and before launch (#16966)

  • Fixed FSDP full-precision param_dtype training (16-mixed, bf16-mixed and 32-true configurations) to avoid FSDP assertion errors with PyTorch < 2.0 (#18278)

[2.0.7] - 2023-08-14

[2.0.7] - Changed

  • Disabled the auto-detection of the Kubeflow environment (#18137)

[2.0.7] - Fixed

  • Fixed issue where DDP subprocesses that used Hydra would set hydra’s working directory to current directory (#18145)

  • Fixed an issue that would prevent the user to set the multiprocessing start method after importing lightning (#18177)

  • Fixed an issue with Fabric.all_reduce() not performing an inplace operation for all backends consistently (#18235)

[2.0.6] - 2023-07-20

[2.0.6] - Fixed

  • Fixed TensorBoardLogger.log_graph not unwrapping the _FabricModule (#17844)

[2.0.5] - 2023-07-07

[2.0.5] - Added

  • Added validation against misconfigured device selection when using the DeepSpeed strategy (#17952)

[2.0.5] - Changed

  • Avoid info message when loading 0 entry point callbacks (#17990)

[2.0.5] - Fixed

  • Fixed the emission of a false-positive warning when calling a method on the Fabric-wrapped module that accepts no arguments (#17875)

  • Fixed check for FSDP’s flat parameters in all parameter groups (#17914)

  • Fixed automatic step tracking in Fabric’s CSVLogger (#17942)

  • Fixed an issue causing the torch.set_float32_matmul_precision info message to show multiple times (#17960)

  • Fixed loading model state when Fabric.load() is called after Fabric.setup() (#17997)

[2.0.4] - 2023-06-22

[2.0.4] - Fixed

  • Fixed validation of parameters of plugins.precision.MixedPrecision (#17687)

  • Fixed an issue with hpu imports leading to performance degradation (#17788)

  • Fixed computing the next version folder in CSVLogger (#17139, #17139)

[2.0.3] - 2023-06-07

  • Added support for Callback registration through entry points (#17756)

[2.0.3] - Changed

  • Made type hints public (#17100)

  • Support compiling a module after it was set up by Fabric (#17529)

[2.0.3] - Fixed

  • Fixed computing the next version folder in CSVLogger (#17139)

  • Fixed inconsistent settings for FSDP Precision (#17670)

[2.0.2] - 2023-04-24

[2.0.2] - Changed

  • Enabled precision autocast for LightningModule step methods in Fabric (#17439)

[2.0.2] - Fixed

  • Fixed an issue with LightningModule.*_step methods bypassing the DDP/FSDP wrapper (#17424)

  • Fixed device handling in Fabric.setup() when the model has no parameters (#17441)

[2.0.1] - 2023-03-30

[2.0.1] - Changed

  • Generalized Optimizer validation to accommodate both FSDP 1.x and 2.x (#16733)

[2.0.0] - 2023-03-15

[2.0.0] - Added

  • Added Fabric.all_reduce (#16459)

  • Added support for saving and loading DeepSpeed checkpoints through Fabric.save/load() (#16452)

  • Added support for automatically calling set_epoch on the dataloader.batch_sampler.sampler (#16841)

  • Added support for writing logs to remote file systems with the CSVLogger (#16880)

  • Added support for frozen dataclasses in the optimizer state (#16656)

  • Added lightning.fabric.is_wrapped to check whether a module, optimizer, or dataloader was already wrapped by Fabric (#16953)

[2.0.0] - Changed

  • Fabric now chooses accelerator="auto", strategy="auto", devices="auto" as defaults (#16842)

  • Checkpoint saving and loading redesign (#16434)

    • Changed the method signatrue of Fabric.save and Fabric.load

    • Changed the method signature of Strategy.save_checkpoint and Fabric.load_checkpoint

    • Fabric.save accepts a state that can contain model and optimizer references

    • Fabric.load can now load state in-place onto models and optimizers

    • Fabric.load returns a dictionary of objects that weren’t loaded into the state

    • Strategy.save_checkpoint and Fabric.load_checkpoint are now responsible for accessing the state of the model and optimizers

  • DataParallelStrategy.get_module_state_dict() and DDPStrategy.get_module_state_dict() now correctly extracts the state dict without keys prefixed with ‘module’ (#16487)

  • “Native” suffix removal (#16490)

    • strategy="fsdp_full_shard_offload" is now strategy="fsdp_cpu_offload"

    • lightning.fabric.plugins.precision.native_amp is now lightning.fabric.plugins.precision.amp

  • Enabled all shorthand strategy names that can be supported in the CLI (#16485)

  • Renamed strategy='tpu_spawn' to strategy='xla' and strategy='tpu_spawn_debug' to strategy='xla_debug' (#16781)

  • Changed arguments for precision settings (from [64|32|16|bf16] to [“64-true”|”32-true”|”16-mixed”|”bf16-mixed”]) (#16767)

  • The selection Fabric(strategy="ddp_spawn", ...) no longer falls back to “ddp” when a cluster environment gets detected (#16780)

  • Renamed setup_dataloaders(replace_sampler=...) to setup_dataloaders(use_distributed_sampler=...) (#16829)

[2.0.0] - Removed

  • Removed support for PyTorch 1.10 (#16492)

  • Removed support for Python 3.7 (#16579)

[2.0.0] - Fixed

  • Fixed issue where the wrapped dataloader iter() would be called twice (#16841)

  • Improved the error message for installing tensorboard or tensorboardx (#17053)

[1.9.4] - 2023-03-01

[1.9.4] - Added

  • Added Fabric(strategy="auto") support (#16916)

[1.9.4] - Fixed

  • Fixed edge cases in parsing device ids using NVML (#16795)

  • Fixed DDP spawn hang on TPU Pods (#16844)

  • Fixed an error when passing find_usable_cuda_devices(num_devices=-1) (#16866)

[1.9.3] - 2023-02-21

[1.9.3] - Fixed

  • Fixed an issue causing a wrong environment plugin to be selected when accelerator=tpu and devices > 1 (#16806)

  • Fixed parsing of defaults for --accelerator and --precision in Fabric CLI when accelerator and precision are set to non-default values in the code (#16818)

[1.9.2] - 2023-02-15

[1.9.2] - Fixed

  • Fixed an attribute error and improved input validation for invalid strategy types being passed to Trainer (#16693)

[1.9.1] - 2023-02-10

[1.9.1] - Fixed

  • Fixed error handling for accelerator="mps" and ddp strategy pairing (#16455)

  • Fixed strict availability check for torch_xla requirement (#16476)

  • Fixed an issue where PL would wrap DataLoaders with XLA’s MpDeviceLoader more than once (#16571)

  • Fixed the batch_sampler reference for DataLoaders wrapped with XLA’s MpDeviceLoader (#16571)

  • Fixed an import error when torch.distributed is not available (#16658)

[1.9.0] - 2023-01-17

[1.9.0] - Added

  • Added Fabric.launch() to programmatically launch processes (e.g. in Jupyter notebook) (#14992)

  • Added the option to launch Fabric scripts from the CLI, without the need to wrap the code into the run method (#14992)

  • Added Fabric.setup_module() and Fabric.setup_optimizers() to support strategies that need to set up the model before an optimizer can be created (#15185)

  • Added support for Fully Sharded Data Parallel (FSDP) training in Lightning Lite (#14967)

  • Added lightning.fabric.accelerators.find_usable_cuda_devices utility function (#16147)

  • Added basic support for LightningModules (#16048)

  • Added support for managing callbacks via Fabric(callbacks=...) and emitting events through Fabric.call() (#16074)

  • Added Logger support (#16121)

    • Added Fabric(loggers=...) to support different Logger frameworks in Fabric

    • Added Fabric.log for logging scalars using multiple loggers

    • Added Fabric.log_dict for logging a dictionary of multiple metrics at once

    • Added Fabric.loggers and Fabric.logger attributes to access the individual logger instances

    • Added support for calling self.log and self.log_dict in a LightningModule when using Fabric

    • Added access to self.logger and self.loggers in a LightningModule when using Fabric

  • Added lightning.fabric.loggers.TensorBoardLogger (#16121)

  • Added lightning.fabric.loggers.CSVLogger (#16346)

  • Added support for a consistent .zero_grad(set_to_none=...) on the wrapped optimizer regardless of which strategy is used (#16275)

[1.9.0] - Changed

  • Renamed the class LightningLite to Fabric (#15932, #15938)

  • The Fabric.run() method is no longer abstract (#14992)

  • The XLAStrategy now inherits from ParallelStrategy instead of DDPSpawnStrategy (#15838)

  • Merged the implementation of DDPSpawnStrategy into DDPStrategy and removed DDPSpawnStrategy (#14952)

  • The dataloader wrapper returned from .setup_dataloaders() now calls .set_epoch() on the distributed sampler if one is used (#16101)

  • Renamed Strategy.reduce to Strategy.all_reduce in all strategies (#16370)

  • When using multiple devices, the strategy now defaults to “ddp” instead of “ddp_spawn” when none is set (#16388)

[1.9.0] - Removed

  • Removed support for FairScale’s sharded training (strategy='ddp_sharded'|'ddp_sharded_spawn'). Use Fully-Sharded Data Parallel instead (strategy='fsdp') (#16329)

[1.9.0] - Fixed

  • Restored sampling parity between PyTorch and Fabric dataloaders when using the DistributedSampler (#16101)

  • Fixes an issue where the error message wouldn’t tell the user the real value that was passed through the CLI (#16334)

[1.8.6] - 2022-12-21

  • minor cleaning

[1.8.5] - 2022-12-15

  • minor cleaning

[1.8.4] - 2022-12-08

[1.8.4] - Fixed

  • Fixed shuffle=False having no effect when using DDP/DistributedSampler (#15931)

[1.8.3] - 2022-11-22

[1.8.3] - Changed

  • Temporarily removed support for Hydra multi-run (#15737)

[1.8.2] - 2022-11-17

[1.8.2] - Fixed

  • Fixed the automatic fallback from LightningLite(strategy="ddp_spawn", ...) to LightningLite(strategy="ddp", ...) when on an LSF cluster (#15103)

[1.8.1] - 2022-11-10

[1.8.1] - Fixed

  • Fix an issue with the SLURM srun detection causing permission errors (#15485)

  • Fixed the import of lightning_lite causing a warning ‘Redirects are currently not supported in Windows or MacOs’ (#15610)