Debugging¶
The following are flags that make debugging much easier.
Fast dev run¶
This flag runs a “unit test” by running 1 training batch and 1 validation batch. The point is to detect any bugs in the training/validation loop without having to wait for a full epoch to crash.
(See: fast_dev_run
argument of Trainer
)
trainer = Trainer(fast_dev_run=True)
Inspect gradient norms¶
Logs (to a logger), the norm of each weight matrix.
(See: track_grad_norm
argument of Trainer
)
# the 2-norm
trainer = Trainer(track_grad_norm=2)
Log GPU usage¶
Logs (to a logger) the GPU usage for each GPU on the master machine.
(See: log_gpu_memory
argument of Trainer
)
trainer = Trainer(log_gpu_memory=True)
Make model overfit on subset of data¶
A good debugging technique is to take a tiny portion of your data (say 2 samples per class), and try to get your model to overfit. If it can’t, it’s a sign it won’t work with large datasets.
(See: overfit_pct
argument of Trainer
)
trainer = Trainer(overfit_pct=0.01)
Print the parameter count by layer¶
Whenever the .fit() function gets called, the Trainer will print the weights summary for the lightningModule. To disable this behavior, turn off this flag:
(See: weights_summary
argument of Trainer
)
trainer = Trainer(weights_summary=None)
Set the number of validation sanity steps¶
Lightning runs a few steps of validation in the beginning of training. This avoids crashing in the validation loop sometime deep into a lengthy training loop.
(See: num_sanity_val_steps
argument of Trainer
)
# DEFAULT
trainer = Trainer(num_sanity_val_steps=5)