Shortcuts

Train on the cloud (basic)

Audience: Anyone looking to train across many machines at once on the cloud.


Why do I need cloud training?

Training on the cloud is a cost effective way to train your models faster by allowing you to access powerful GPU machines.

For example, if your model takes 10 days to train on a CPU machine, here’s how cloud training can speed up your training time:

Training speed vs cost

Machine type

Training time

Cost (AWS 1 M60 GPU)

CPU

10 days

$12.00

1 GPU

2 days

$11.52

2 GPU

1 day

$20.64

4 GPU

12 hours

$19.08


Start a cloud machine in < 1 minute

Lightning has a native cloud solution with various products (lightning-grid) designed for researchers and ML practicioners in industry. To start an interactive machine simply go to Lightning Grid to create a free account, then start a new Grid Session.

A Grid Session is an interactive machine with 1-16 GPUs per machine.

Start a Grid Session in a few seconds

Open the Jupyter Notebook

Once the Session starts, open a Jupyter notebook.


Clone and run your model

On the Jupyter page you can use a Notebook, or to clone your code and run via the CLI.


Cost

Lightning (via lightning-grid) provides access to cloud machines to the community for free. However, you must buy credits on lightning-grid which are used to pay the cloud providers on your behalf.

If you want to run on your own AWS account and pay the cloud provider directly, please contact our onprem team: mailto:onprem@pytorchlightning.ai


Next Steps

Here are the recommended next steps depending on your workflow.


© Copyright Copyright (c) 2018-2022, Lightning AI et al... Revision dbb5ca8d.

Built with Sphinx using a theme provided by Read the Docs.