This applies to saving and writing checkpoints, as well as for logging. Working with different filesystems can be accomplished by appending a protocol like “s3:/” to file paths for writing and reading data.
# `default_root_dir` is the default path used for logs and checkpoints trainer = Trainer(default_root_dir="s3://my_bucket/data/") trainer.fit(model)
You could pass custom paths to loggers for logging data.
from pytorch_lightning.loggers import TensorBoardLogger logger = TensorBoardLogger(save_dir="s3://my_bucket/logs/") trainer = Trainer(logger=logger) trainer.fit(model)
Additionally, you could also resume training with a checkpoint stored at a remote filesystem.
trainer = Trainer(default_root_dir=tmpdir, max_steps=3) trainer.fit(model, ckpt_path="s3://my_bucket/ckpts/classifier.ckpt")
PyTorch Lightning uses fsspec internally to handle all filesystem operations.
The most common filesystems supported by Lightning are:
file://- It’s the default and doesn’t need any protocol to be used. It’s installed by default in Lightning.
s3://- Amazon S3 remote binary store, using the library s3fs. Run
pip install fsspec[s3]to install it.
Google Cloud Storage:
gs://- Google Cloud Storage, using gcsfs. Run
pip install fsspec[gcs]to install it.
Microsoft Azure Storage:
az://- Microsoft Azure Storage, using adlfs. Run
pip install fsspec[adl]to install it.
Hadoop File System:
hdfs://- Hadoop Distributed File System. This uses PyArrow as the backend. Run
pip install fsspec[hdfs]to install it.
You could learn more about the available filesystems with:
from fsspec.registry import known_implementations print(known_implementations)
You could also look into CheckpointIO Plugin for more details on how to customize saving and loading checkpoints.