Habana Gaudi AI Processor (HPU)¶
Lightning supports Habana Gaudi AI Processor (HPU), for accelerating Deep Learning training workloads.
Habana® Gaudi® AI training processors are built on a heterogeneous architecture with a cluster of fully programmable Tensor Processing Cores (TPC) along with its associated development tools and libraries, and a configurable Matrix Math engine.
The TPC core is a VLIW SIMD processor with an instruction set and hardware tailored to serve training workloads efficiently. The Gaudi memory architecture includes on-die SRAM and local memories in each TPC and, Gaudi is the first DL training processor that has integrated RDMA over Converged Ethernet (RoCE v2) engines on-chip.
On the software side, the PyTorch Habana bridge interfaces between the framework and SynapseAI software stack to enable the execution of deep learning models on the Habana Gaudi device.
Gaudi offers a substantial price/performance advantage – so you get to do more deep learning training while spending less.
How to access HPUs¶
Check out the Getting Started Guide with AWS and Habana.
Training with HPUs¶
To enable PyTorch Lightning to utilize the HPU accelerator, simply provide
accelerator="hpu" parameter to the Trainer class.
trainer = Trainer(accelerator="hpu")
accelerator="hpu" to the Trainer class enables the Habana accelerator for single Gaudi training.
trainer = Trainer(devices=1, accelerator="hpu")
accelerator="hpu" parameters to the Trainer class enables the Habana accelerator for distributed training with 8 Gaudis.
HPUParallelStrategy internally which is based on DDP strategy with the addition of Habana’s collective communication library (HCCL) to support scale-up within a node and scale-out across multiple nodes.
trainer = Trainer(devices=8, accelerator="hpu")
devices flag is not defined, it will assume
devices to be
"auto" and select 8 Gaudi devices for
Mixed Precision Plugin¶
Lightning also allows mixed precision training with HPUs.
By default, HPU training will use 32-bit precision. To enable mixed precision, set the
trainer = Trainer(devices=1, accelerator="hpu", precision=16)
Enabling Mixed Precision Options¶
HPUPrecisionPlugin uses the Habana Mixed Precision (HMP) package to enable mixed precision training.
You can execute the ops in FP32 or BF16 precision. The HMP package modifies the Python operators to add the appropriate cast operations for the arguments before execution. The default settings enable users to enable mixed precision training with minimal code easily.
In addition to the default settings in HMP, users also have the option of overriding these defaults and providing their
BF16 and FP32 operator lists by passing them as parameter to
The below snippet shows an example model using MNIST with a single Habana Gaudi device and making use of HMP by overriding the default parameters. This enables advanced users to provide their own BF16 and FP32 operator list instead of using the HMP defaults.
import pytorch_lightning as pl from pytorch_lightning.plugins import HPUPrecisionPlugin # Initialize a trainer with HPU accelerator for HPU strategy for single device, # with mixed precision using overidden HMP settings trainer = pl.Trainer( accelerator="hpu", devices=1, # Optional Habana mixed precision params to be set # Checkout `pl_examples/hpu_examples/simple_mnist/ops_bf16_mnist.txt` for the format plugins=[ HPUPrecisionPlugin( precision=16, opt_level="O1", verbose=False, bf16_file_path="ops_bf16_mnist.txt", fp32_file_path="ops_fp32_mnist.txt", ) ], ) # Init our model model = LitClassifier() # Init the data dm = MNISTDataModule(batch_size=batch_size) # Train the model ⚡ trainer.fit(model, datamodule=dm)
For more details, please refer to PyTorch Mixed Precision Training on Gaudi.