Compute scaling¶
When launching a server on the LEAP JupyterHub, you'll be asked to select a compute configuration. This guide helps you choose the right image and hardware resources (RAM and CPU/GPU) for your workflow.
Image Types¶
Each image contains a different set of pre-installed software packages. Choose the image that fits your computing needs:
Image Name | Use When You Need... |
---|---|
Base Pangeo Notebook | General scientific stack (e.g., xarray, dask, matplotlib). Ideal for climate, ocean, and earth science workflows. |
Pangeo PyTorch ML Notebook | PyTorch for machine learning. Runs on CPU or GPU depending on your hardware choice. |
Pangeo TensorFlow ML Notebook | TensorFlow for machine learning. Runs on CPU or GPU depending on your hardware choice. |
Other... | Enter a custom image URL |
Resource Options¶
Choose a CPU/GPU configuration based on the size of your data and the complexity of your tasks.
CPU¶
Use this for data exploration, lightweight model runs, or debugging.
Option | Use Case |
---|---|
~8 GB, ~1.0 CPU | Small notebooks, light plotting, CSVs or small NetCDF. |
~16–64 GB, ~2–8 CPU | Medium-sized xarray/dask workloads, ML prototyping. |
~128 GB, ~16 CPU | Large simulations, ensemble runs, or parallel workflows. |
GPU¶
Use this for training deep learning models or doing heavy inference.
- NVIDIA Tesla T4, 24GB RAM, 8 CPUs
- Compatible with all images
- Greatly accelerates TensorFlow and PyTorch training
Why not use GPU by default?¶
While GPU can accelerate certain workloads, it's not always the best choice for the following reasons:
- Most tasks don't benefit: Plotting, pandas/xarray analysis, or basic modeling run just as fast (or faster) on CPUs.
- Shared, limited resources: GPUs are a shared resource across LEAP. Using them when not needed can block others who rely on them for large-scale work.
- More costly: GPUs cost significantly more than CPU resources.
Note
Please use GPUs only if your workflow truly needs it.
How to choose:¶
Here is a simplified guide on how to choose the appropriate image and compute configuration:;
Scenario | Recommended Setup |
---|---|
Editing a notebook, small CSVs | Base Pangeo Notebook + 8 GB CPU |
Plotting large NetCDF files with xarray | Base Pangeo Notebook + 16–64 GB CPU |
Visualizing high-resolution model outputs | Base Pangeo Notebook + 16–64 GB CPU |
Preprocessing large climate or satellite datasets | Base Pangeo Notebook + 64–128 GB CPU |
Large parallel Dask workloads | Base Pangeo Notebook + 32–128 GB CPU |
Interactive Dask dashboard or distributed workflows | Base Pangeo Notebook + 32–128 GB CPU |
Running large batch inference | PyTorch/TensorFlow ML Notebook + GPU |
Debugging or inference on smaller models | PyTorch/TensorFlow ML Notebook + CPU |
Training a PyTorch model | PyTorch ML Notebook + GPU |
Running a TensorFlow model at scale | TensorFlow ML Notebook + GPU |
Fine-tuning pre-trained deep learning models | PyTorch/TensorFlow ML Notebook + GPU |
Hyperparameter tuning / grid search (ML) | PyTorch/TensorFlow ML Notebook + GPU or Devs Only |
Generating synthetic datasets or simulations | Base Pangeo Notebook + 16–128 GB CPU |
Tip
If you are not sure which one to pick, then start with the Base Pangeo Notebook + 8-16 GB CPU. You can always stop your server and restart with a different configuration