Saving and loading weights¶

Lightning can automate saving and loading checkpoints.

Checkpoint saving¶

A Lightning checkpoint has everything needed to restore a training session including:

16-bit scaling factor (apex)
Current epoch
Global step
Model state_dict
State of all optimizers
State of all learningRate schedulers
State of all callbacks
The hyperparameters used for that model if passed in as hparams (Argparse.Namespace)

Automatic saving¶

Checkpointing is enabled by default to the current working directory. To change the checkpoint path pass in:

trainer = Trainer(default_root_dir='/your/path/to/save/checkpoints')

To modify the behavior of checkpointing pass in your own callback.

from pytorch_lightning.callbacks import ModelCheckpoint

# DEFAULTS used by the Trainer
checkpoint_callback = ModelCheckpoint(
    filepath=os.getcwd(),
    save_top_k=1,
    verbose=True,
    monitor='checkpoint_on',
    mode='min',
    prefix=''
)

trainer = Trainer(checkpoint_callback=checkpoint_callback)

Or disable it by passing

trainer = Trainer(checkpoint_callback=False)

The Lightning checkpoint also saves the arguments passed into the LightningModule init under the module_arguments key in the checkpoint.

class MyLightningModule(LightningModule):

   def __init__(self, learning_rate, *args, **kwargs):
        super().__init__()

# all init args were saved to the checkpoint
checkpoint = torch.load(CKPT_PATH)
print(checkpoint['module_arguments'])
# {'learning_rate': the_value}

Manual saving¶

You can manually save checkpoints and restore your model from the checkpointed state.

model = MyLightningModule(hparams)
trainer.fit(model)
trainer.save_checkpoint("example.ckpt")
new_model = MyModel.load_from_checkpoint(checkpoint_path="example.ckpt")

Checkpoint Loading¶

To load a model along with its weights, biases and module_arguments use following method.

model = MyLightingModule.load_from_checkpoint(PATH)

print(model.learning_rate)
# prints the learning_rate you used in this checkpoint

model.eval()
y_hat = model(x)

But if you don’t want to use the values saved in the checkpoint, pass in your own here

class LitModel(LightningModule):

    def __init__(self, in_dim, out_dim):
        super().__init__()
        self.save_hyperparameters()
        self.l1 = nn.Linear(self.hparams.in_dim, self.hparams.out_dim)

you can restore the model like this

# if you train and save the model like this it will use these values when loading
# the weights. But you can overwrite this
LitModel(in_dim=32, out_dim=10)

# uses in_dim=32, out_dim=10
model = LitModel.load_from_checkpoint(PATH)

# uses in_dim=128, out_dim=10
model = LitModel.load_from_checkpoint(PATH, in_dim=128, out_dim=10)

Restoring Training State¶

If you don’t just want to load weights, but instead restore the full training, do the following:

model = LitModel()
trainer = Trainer(resume_from_checkpoint='some/path/to/my_checkpoint.ckpt')

# automatically restores model, epoch, step, LR schedulers, apex, etc...
trainer.fit(model)