Loggers¶

Lightning supports the most popular logging frameworks (TensorBoard, Comet, etc…). To use a logger, simply pass it into the Trainer. Lightning uses TensorBoard by default.

from pytorch_lightning import loggers as pl_loggers

tb_logger = pl_loggers.TensorBoardLogger('logs/')
trainer = Trainer(logger=tb_logger)

Choose from any of the others such as MLflow, Comet, Neptune, WandB, …

comet_logger = pl_loggers.CometLogger(save_dir='logs/')
trainer = Trainer(logger=comet_logger)

To use multiple loggers, simply pass in a list or tuple of loggers …

tb_logger = pl_loggers.TensorBoardLogger('logs/')
comet_logger = pl_loggers.CometLogger(save_dir='logs/')
trainer = Trainer(logger=[tb_logger, comet_logger])

Note

All loggers log by default to os.getcwd(). To change the path without creating a logger set Trainer(default_root_dir=’/your/path/to/save/checkpoints’)

Logging from a LightningModule¶

Use the Result objects to log from any lightning module.

Training loop logging¶

To log in the training loop use the TrainResult.

def training_step(self, batch, batch_idx):
    loss = ...

    result = pl.TrainResult(minimize=loss)
    result.log('train_loss', loss)
    return result

The Result object is simply a dictionary that gives you added methods like log and write and automatically detaches tensors (except for the minimize value).

result = pl.TrainResult(minimize=loss)
result.log('train_loss', loss)
print(result)

{'train_loss': tensor([0.2262])}

The TrainResult can log at two places in the training, on each step (TrainResult(on_step=True)) and the aggregate at the end of the epoch (TrainResult(on_epoch=True)).

for epoch in epochs:
    epoch_outs = []
    for batch in train_dataloader():
        # ......
        out = training_step(batch)
        # < ----------- log (on_step=True)
        epoch_outs.append(out)

    # < -------------- log (on_epoch=True)
    auto_reduce_log(epoch_outs)

Validation loop logging¶

To log in the training loop use the EvalResult.

def validation_step(self, batch, batch_idx):
    loss = ...

    result = pl.EvalResult()
    result.log('val_loss', loss)
    return result

The EvalResult object is simply a dictionary that gives you added methods like log and write and automatically detaches tensors (except for the minimize value).

result = pl.EvalResult()
result.log('val_loss', loss)
print(result)

{'val_loss': tensor([0.2262])}

The EvalResult can log at two places in the validation loop, on each step (EvalResult(on_step=True)) and the aggregate at the end of the epoch (EvalResult(on_epoch=True)).

def run_val_loop():
    epoch_outs = []
    for batch in val_dataloader():
        out = validation_step(batch)
        # < ----------- log (on_step=True)
        epoch_outs.append(out)

    # < -------------- log (on_epoch=True)
    auto_reduce_log(epoch_outs)

Test loop logging¶

See the previous section.

Manual logging¶

For certain things like histograms, text, images, etc… you may need to use the logger object directly.

def training_step(...):
    ...
    # the logger you used (in this case tensorboard)
    tensorboard = self.logger.experiment
    tensorboard.add_histogram(...)
    tensorboard.add_figure(...)

This also applies to Callbacks

Logging from a Callback¶

To log from a callback, access the logger object directly

class MyCallback(Callback):

    def on_train_epoch_end(self, trainer, pl_module):
        tensorboard = pl_module.logger.experiment
        tensorboard.add_histogram(...)
        tensorboard.add_figure(...)

Make a Custom Logger¶

You can implement your own logger by writing a class that inherits from LightningLoggerBase. Use the rank_zero_only() decorator to make sure that only the first process in DDP training logs data.

from pytorch_lightning.utilities import rank_zero_only
from pytorch_lightning.loggers import LightningLoggerBase

class MyLogger(LightningLoggerBase):

    @rank_zero_only
    def log_hyperparams(self, params):
        # params is an argparse.Namespace
        # your code to record hyperparameters goes here
        pass

    @rank_zero_only
    def log_metrics(self, metrics, step):
        # metrics is a dictionary of metric names and values
        # your code to record metrics goes here
        pass

    def save(self):
        # Optional. Any code necessary to save logger data goes here
        pass

    @rank_zero_only
    def finalize(self, status):
        # Optional. Any code that needs to be run after training
        # finishes goes here
        pass

If you write a logger that may be useful to others, please send a pull request to add it to Lighting!

Supported Loggers¶

The following are loggers we support

Comet¶

class pytorch_lightning.loggers.comet.CometLogger(api_key=None, save_dir=None, workspace=None, project_name=None, rest_api_key=None, experiment_name=None, experiment_key=None, offline=False, **kwargs)[source]

Bases: pytorch_lightning.loggers.base.LightningLoggerBase

Log using Comet.ml. Install it with pip:

pip install comet-ml

Comet requires either an API Key (online mode) or a local directory path (offline mode).

ONLINE MODE

Example

>>> import os
>>> from pytorch_lightning import Trainer
>>> from pytorch_lightning.loggers import CometLogger
>>> # arguments made to CometLogger are passed on to the comet_ml.Experiment class
>>> comet_logger = CometLogger(
...     api_key=os.environ.get('COMET_API_KEY'),
...     workspace=os.environ.get('COMET_WORKSPACE'),  # Optional
...     save_dir='.',  # Optional
...     project_name='default_project',  # Optional
...     rest_api_key=os.environ.get('COMET_REST_API_KEY'),  # Optional
...     experiment_name='default'  # Optional
... )
>>> trainer = Trainer(logger=comet_logger)

OFFLINE MODE

Example

>>> from pytorch_lightning.loggers import CometLogger
>>> # arguments made to CometLogger are passed on to the comet_ml.Experiment class
>>> comet_logger = CometLogger(
...     save_dir='.',
...     workspace=os.environ.get('COMET_WORKSPACE'),  # Optional
...     project_name='default_project',  # Optional
...     rest_api_key=os.environ.get('COMET_REST_API_KEY'),  # Optional
...     experiment_name='default'  # Optional
... )
>>> trainer = Trainer(logger=comet_logger)

Parameters

api_key¶ (Optional[str]) – Required in online mode. API key, found on Comet.ml. If not given, this will be loaded from the environment variable COMET_API_KEY or ~/.comet.config if either exists.
save_dir¶ (Optional[str]) – Required in offline mode. The path for the directory to save local comet logs. If given, this also sets the directory for saving checkpoints.
workspace¶ (Optional[str]) – Optional. Name of workspace for this user
project_name¶ (Optional[str]) – Optional. Send your experiment to a specific project. Otherwise will be sent to Uncategorized Experiments. If the project name does not already exist, Comet.ml will create a new project.
rest_api_key¶ (Optional[str]) – Optional. Rest API key found in Comet.ml settings. This is used to determine version number
experiment_name¶ (Optional[str]) – Optional. String representing the name for this particular experiment on Comet.ml.
experiment_key¶ (Optional[str]) – Optional. If set, restores from existing experiment.
offline¶ (bool) – If api_key and save_dir are both given, this determines whether the experiment will be in online or offline mode. This is useful if you use save_dir to control the checkpoints directory and have a ~/.comet.config file but still want to run offline experiments.

finalize(status)[source]

When calling self.experiment.end(), that experiment won’t log any more data to Comet. That’s why, if you need to log any more data, you need to create an ExistingCometExperiment. For example, to log data when testing your model after training, because when training is finalized CometLogger.finalize() is called.

This happens automatically in the experiment() property, when self._experiment is set to None, i.e. self.reset_experiment().

Return type: None

log_hyperparams(params)[source]

Record hyperparameters.

Parameters: params¶ (Union[Dict[str, Any], Namespace]) – Namespace containing the hyperparameters
Return type: None

log_metrics(metrics, step=None)[source]

Records metrics. This method logs metrics as as soon as it received them. If you want to aggregate metrics for one specific step, use the agg_and_log_metrics() method.

Parameters

metrics¶ (Dict[str, Union[Tensor, float]]) – Dictionary with metric names as keys and measured quantities as values
step¶ (Optional[int]) – Step number at which the metrics should be recorded

Return type

None

property experiment[source]

Actual Comet object. To use Comet features in your LightningModule do the following.

Example:

self.logger.experiment.some_comet_function()

Return type: BaseExperiment

property name[source]

Return the experiment name.

Return type: str

property save_dir[source]

Return the root directory where experiment logs get saved, or None if the logger does not save data locally.

Return type: Optional[str]

property version[source]

Return the experiment version.

Return type: str

CSVLogger¶

class pytorch_lightning.loggers.csv_logs.CSVLogger(save_dir, name='default', version=None)[source]

Bases: pytorch_lightning.loggers.base.LightningLoggerBase

Log to local file system in yaml and CSV format. Logs are saved to os.path.join(save_dir, name, version).

Example

>>> from pytorch_lightning import Trainer
>>> from pytorch_lightning.loggers import CSVLogger
>>> logger = CSVLogger("logs", name="my_exp_name")
>>> trainer = Trainer(logger=logger)

Parameters

save_dir¶ (str) – Save directory
name¶ (Optional[str]) – Experiment name. Defaults to 'default'.
version¶ (Union[int, str, None]) – Experiment version. If version is not specified the logger inspects the save directory for existing versions, then automatically assigns the next available version.

finalize(status)[source]

Do any processing that is necessary to finalize an experiment.

Parameters: status¶ (str) – Status that the experiment finished with (e.g. success, failed, aborted)
Return type: None

log_hyperparams(params)[source]

Record hyperparameters.

Parameters: params¶ (Union[Dict[str, Any], Namespace]) – Namespace containing the hyperparameters
Return type: None

log_metrics(metrics, step=None)[source]

Records metrics. This method logs metrics as as soon as it received them. If you want to aggregate metrics for one specific step, use the agg_and_log_metrics() method.

Parameters

metrics¶ (Dict[str, float]) – Dictionary with metric names as keys and measured quantities as values
step¶ (Optional[int]) – Step number at which the metrics should be recorded

Return type

None

save()[source]

Save log data.

Return type: None

property experiment[source]

Actual ExperimentWriter object. To use ExperimentWriter features in your LightningModule do the following.

Example:

self.logger.experiment.some_experiment_writer_function()

Return type: ExperimentWriter

property log_dir[source]

The log directory for this run. By default, it is named 'version_${self.version}' but it can be overridden by passing a string value for the constructor’s version parameter instead of None or an int.

Return type: str

property name[source]

Return the experiment name.

Return type: str

property root_dir[source]

Parent directory for all checkpoint subdirectories. If the experiment name parameter is None or the empty string, no experiment subdirectory is used and the checkpoint will be saved in “save_dir/version_dir”

Return type: str

property save_dir[source]

Return the root directory where experiment logs get saved, or None if the logger does not save data locally.

Return type: Optional[str]

property version[source]

Return the experiment version.

Return type: int

MLFlow¶

class pytorch_lightning.loggers.mlflow.MLFlowLogger(experiment_name='default', tracking_uri=None, tags=None, save_dir='./mlruns')[source]

Bases: pytorch_lightning.loggers.base.LightningLoggerBase

Log using MLflow. Install it with pip:

pip install mlflow

Example

>>> from pytorch_lightning import Trainer
>>> from pytorch_lightning.loggers import MLFlowLogger
>>> mlf_logger = MLFlowLogger(
...     experiment_name="default",
...     tracking_uri="file:./ml-runs"
... )
>>> trainer = Trainer(logger=mlf_logger)

Use the logger anywhere in you LightningModule as follows:

>>> from pytorch_lightning import LightningModule
>>> class LitModel(LightningModule):
...     def training_step(self, batch, batch_idx):
...         # example
...         self.logger.experiment.whatever_ml_flow_supports(...)
...
...     def any_lightning_module_function_or_hook(self):
...         self.logger.experiment.whatever_ml_flow_supports(...)

Parameters

experiment_name¶ (str) – The name of the experiment
tracking_uri¶ (Optional[str]) – Address of local or remote tracking server. If not provided, defaults to file:<save_dir>.
tags¶ (Optional[Dict[str, Any]]) – A dictionary tags for the experiment.
save_dir¶ (Optional[str]) – A path to a local directory where the MLflow runs get saved. Defaults to ./mlflow if tracking_uri is not provided. Has no effect if tracking_uri is provided.

finalize(status='FINISHED')[source]

Do any processing that is necessary to finalize an experiment.

Parameters: status¶ (str) – Status that the experiment finished with (e.g. success, failed, aborted)
Return type: None

log_hyperparams(params)[source]

Record hyperparameters.

Parameters: params¶ (Union[Dict[str, Any], Namespace]) – Namespace containing the hyperparameters
Return type: None

log_metrics(metrics, step=None)[source]

Records metrics. This method logs metrics as as soon as it received them. If you want to aggregate metrics for one specific step, use the agg_and_log_metrics() method.

Parameters

metrics¶ (Dict[str, float]) – Dictionary with metric names as keys and measured quantities as values
step¶ (Optional[int]) – Step number at which the metrics should be recorded

Return type

None

property experiment[source]

Actual MLflow object. To use MLflow features in your LightningModule do the following.

Example:

self.logger.experiment.some_mlflow_function()

Return type: MlflowClient

property name[source]

Return the experiment name.

Return type: str

property save_dir[source]

The root file directory in which MLflow experiments are saved.

Return type: Optional[str]
Returns: Local path to the root experiment directory if the tracking uri is local. Otherwhise returns None.

property version[source]

Return the experiment version.

Return type: str

Neptune¶

class pytorch_lightning.loggers.neptune.NeptuneLogger(api_key=None, project_name=None, close_after_fit=True, offline_mode=False, experiment_name=None, upload_source_files=None, params=None, properties=None, tags=None, **kwargs)[source]

Bases: pytorch_lightning.loggers.base.LightningLoggerBase

Log using Neptune. Install it with pip:

pip install neptune-client

The Neptune logger can be used in the online mode or offline (silent) mode. To log experiment data in online mode, NeptuneLogger requires an API key. In offline mode, the logger does not connect to Neptune.

ONLINE MODE

Example

>>> from pytorch_lightning import Trainer
>>> from pytorch_lightning.loggers import NeptuneLogger
>>> # arguments made to NeptuneLogger are passed on to the neptune.experiments.Experiment class
>>> # We are using an api_key for the anonymous user "neptuner" but you can use your own.
>>> neptune_logger = NeptuneLogger(
...     api_key='ANONYMOUS',
...     project_name='shared/pytorch-lightning-integration',
...     experiment_name='default',  # Optional,
...     params={'max_epochs': 10},  # Optional,
...     tags=['pytorch-lightning', 'mlp']  # Optional,
... )
>>> trainer = Trainer(max_epochs=10, logger=neptune_logger)

OFFLINE MODE

Example

>>> from pytorch_lightning.loggers import NeptuneLogger
>>> # arguments made to NeptuneLogger are passed on to the neptune.experiments.Experiment class
>>> neptune_logger = NeptuneLogger(
...     offline_mode=True,
...     project_name='USER_NAME/PROJECT_NAME',
...     experiment_name='default',  # Optional,
...     params={'max_epochs': 10},  # Optional,
...     tags=['pytorch-lightning', 'mlp']  # Optional,
... )
>>> trainer = Trainer(max_epochs=10, logger=neptune_logger)

Use the logger anywhere in you LightningModule as follows:

>>> from pytorch_lightning import LightningModule
>>> class LitModel(LightningModule):
...     def training_step(self, batch, batch_idx):
...         # log metrics
...         self.logger.experiment.log_metric('acc_train', ...)
...         # log images
...         self.logger.experiment.log_image('worse_predictions', ...)
...         # log model checkpoint
...         self.logger.experiment.log_artifact('model_checkpoint.pt', ...)
...         self.logger.experiment.whatever_neptune_supports(...)
...
...     def any_lightning_module_function_or_hook(self):
...         self.logger.experiment.log_metric('acc_train', ...)
...         self.logger.experiment.log_image('worse_predictions', ...)
...         self.logger.experiment.log_artifact('model_checkpoint.pt', ...)
...         self.logger.experiment.whatever_neptune_supports(...)

If you want to log objects after the training is finished use close_after_fit=False:

neptune_logger = NeptuneLogger(
    ...
    close_after_fit=False,
    ...
)
trainer = Trainer(logger=neptune_logger)
trainer.fit()

# Log test metrics
trainer.test(model)

# Log additional metrics
from sklearn.metrics import accuracy_score

accuracy = accuracy_score(y_true, y_pred)
neptune_logger.experiment.log_metric('test_accuracy', accuracy)

# Log charts
from scikitplot.metrics import plot_confusion_matrix
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(16, 12))
plot_confusion_matrix(y_true, y_pred, ax=ax)
neptune_logger.experiment.log_image('confusion_matrix', fig)

# Save checkpoints folder
neptune_logger.experiment.log_artifact('my/checkpoints')

# When you are done, stop the experiment
neptune_logger.experiment.stop()

Tensorboard¶

class pytorch_lightning.loggers.tensorboard.TensorBoardLogger(save_dir, name='default', version=None, log_graph=True, **kwargs)[source]

Bases: pytorch_lightning.loggers.base.LightningLoggerBase

Log to local file system in TensorBoard format. Implemented using SummaryWriter. Logs are saved to os.path.join(save_dir, name, version). This is the default logger in Lightning, it comes preinstalled.

Example

>>> from pytorch_lightning import Trainer
>>> from pytorch_lightning.loggers import TensorBoardLogger
>>> logger = TensorBoardLogger("tb_logs", name="my_model")
>>> trainer = Trainer(logger=logger)

Parameters

save_dir¶ (str) – Save directory
name¶ (Optional[str]) – Experiment name. Defaults to 'default'. If it is the empty string then no per-experiment subdirectory is used.
version¶ (Union[int, str, None]) – Experiment version. If version is not specified the logger inspects the save directory for existing versions, then automatically assigns the next available version. If it is a string then it is used as the run-specific subdirectory name, otherwise 'version_${version}' is used.
log_graph¶ (bool) – Adds the computational graph to tensorboard. This requires that the user has defined the self.example_input_array attribute in their model.
**kwargs¶ – Other arguments are passed directly to the SummaryWriter constructor.

finalize(status)[source]

Do any processing that is necessary to finalize an experiment.

Parameters: status¶ (str) – Status that the experiment finished with (e.g. success, failed, aborted)
Return type: None

log_graph(model, input_array=None)[source]

Record model graph

Parameters

model¶ (LightningModule) – lightning model
input_array¶ – input passes to model.forward

log_hyperparams(params, metrics=None)[source]

Record hyperparameters.

Parameters: params¶ (Union[Dict[str, Any], Namespace]) – Namespace containing the hyperparameters
Return type: None

log_metrics(metrics, step=None)[source]

Records metrics. This method logs metrics as as soon as it received them. If you want to aggregate metrics for one specific step, use the agg_and_log_metrics() method.

Parameters

metrics¶ (Dict[str, float]) – Dictionary with metric names as keys and measured quantities as values
step¶ (Optional[int]) – Step number at which the metrics should be recorded

Return type

None

save()[source]

Save log data.

Return type: None

property experiment[source]

Actual tensorboard object. To use TensorBoard features in your LightningModule do the following.

Example:

self.logger.experiment.some_tensorboard_function()

Return type: SummaryWriter

property log_dir[source]

The directory for this run’s tensorboard checkpoint. By default, it is named 'version_${self.version}' but it can be overridden by passing a string value for the constructor’s version parameter instead of None or an int.

Return type: str

property name[source]

Return the experiment name.

Return type: str

property root_dir[source]

Parent directory for all tensorboard checkpoint subdirectories. If the experiment name parameter is None or the empty string, no experiment subdirectory is used and the checkpoint will be saved in “save_dir/version_dir”

Return type: str

property save_dir[source]

Return the root directory where experiment logs get saved, or None if the logger does not save data locally.

Return type: Optional[str]

property version[source]

Return the experiment version.

Return type: int

Test-tube¶

class pytorch_lightning.loggers.test_tube.TestTubeLogger(save_dir, name='default', description=None, debug=False, version=None, create_git_tag=False, log_graph=True)[source]

Bases: pytorch_lightning.loggers.base.LightningLoggerBase

Log to local file system in TensorBoard format but using a nicer folder structure (see full docs). Install it with pip:

pip install test_tube

Example

>>> from pytorch_lightning import Trainer
>>> from pytorch_lightning.loggers import TestTubeLogger
>>> logger = TestTubeLogger("tt_logs", name="my_exp_name")
>>> trainer = Trainer(logger=logger)

Use the logger anywhere in your LightningModule as follows:

>>> from pytorch_lightning import LightningModule
>>> class LitModel(LightningModule):
...     def training_step(self, batch, batch_idx):
...         # example
...         self.logger.experiment.whatever_method_summary_writer_supports(...)
...
...     def any_lightning_module_function_or_hook(self):
...         self.logger.experiment.add_histogram(...)

Parameters

save_dir¶ (str) – Save directory
name¶ (str) – Experiment name. Defaults to 'default'.
description¶ (Optional[str]) – A short snippet about this experiment
debug¶ (bool) – If True, it doesn’t log anything.
version¶ (Optional[int]) – Experiment version. If version is not specified the logger inspects the save directory for existing versions, then automatically assigns the next available version.
create_git_tag¶ (bool) – If True creates a git tag to save the code used in this experiment.
log_graph¶ – Adds the computational graph to tensorboard. This requires that the user has defined the self.example_input_array attribute in their model.

close()[source]

Do any cleanup that is necessary to close an experiment.

Return type: None

finalize(status)[source]

Do any processing that is necessary to finalize an experiment.

Parameters: status¶ (str) – Status that the experiment finished with (e.g. success, failed, aborted)
Return type: None

log_graph(model, input_array=None)[source]

Record model graph

Parameters

model¶ (LightningModule) – lightning model
input_array¶ – input passes to model.forward

log_hyperparams(params)[source]

Record hyperparameters.

Parameters: params¶ (Union[Dict[str, Any], Namespace]) – Namespace containing the hyperparameters
Return type: None

log_metrics(metrics, step=None)[source]

Records metrics. This method logs metrics as as soon as it received them. If you want to aggregate metrics for one specific step, use the agg_and_log_metrics() method.

Parameters

metrics¶ (Dict[str, float]) – Dictionary with metric names as keys and measured quantities as values
step¶ (Optional[int]) – Step number at which the metrics should be recorded

Return type

None

save()[source]

Save log data.

Return type: None

property experiment[source]

Actual TestTube object. To use TestTube features in your LightningModule do the following.

Example:

self.logger.experiment.some_test_tube_function()

Return type: Experiment

property name[source]

Return the experiment name.

Return type: str

property save_dir[source]

Return the root directory where experiment logs get saved, or None if the logger does not save data locally.

Return type: Optional[str]

property version[source]

Return the experiment version.

Return type: int

Weights and Biases¶

class pytorch_lightning.loggers.wandb.WandbLogger(name=None, save_dir=None, offline=False, id=None, anonymous=False, version=None, project=None, tags=None, log_model=False, experiment=None, entity=None, group=None)[source]

Bases: pytorch_lightning.loggers.base.LightningLoggerBase

Log using Weights and Biases. Install it with pip:

pip install wandb

Parameters

name¶ (Optional[str]) – Display name for the run.
save_dir¶ (Optional[str]) – Path where data is saved.
offline¶ (bool) – Run offline (data can be streamed later to wandb servers).
id¶ (Optional[str]) – Sets the version, mainly used to resume a previous run.
anonymous¶ (bool) – Enables or explicitly disables anonymous logging.
version¶ (Optional[str]) – Sets the version, mainly used to resume a previous run.
project¶ (Optional[str]) – The name of the project to which this run will belong.
tags¶ (Optional[List[str]]) – Tags associated with this run.
log_model¶ (bool) – Save checkpoints in wandb dir to upload on W&B servers.
experiment¶ – WandB experiment object
entity¶ – The team posting this run (default: your username or your default team)
group¶ (Optional[str]) – A unique string shared by all runs in a given group

Example

>>> from pytorch_lightning.loggers import WandbLogger
>>> from pytorch_lightning import Trainer
>>> wandb_logger = WandbLogger()
>>> trainer = Trainer(logger=wandb_logger)