← Back to main docs

GPU Stack Installation & CUDA Enablement

Use this guide to install the GPU-capable versions of PyTorch and TensorFlow, verify CUDA availability, and switch between CPU and GPU smoke tests.

1. Prerequisites

  • NVIDIA driver that supports the CUDA version you plan to use. Update through NVIDIA Experience (Windows) or your distro’s driver manager (Linux).

  • CUDA toolkit (optional). Most pip wheels bundle the needed runtime, but native builds may require the toolkit/nvcc.

  • Python 3.10–3.12 in a virtual environment.

2. PyTorch (CUDA) Installation

Pick the CUDA build that matches your driver/toolkit. Example for CUDA 12.1:

# Windows / macOS / Linux (same pip command)
pip install torch==2.2.2 --index-url https://download.pytorch.org/whl/cu121

Need CUDA 11.8 instead? Swap the wheel URL:

pip install torch==2.2.2 --index-url https://download.pytorch.org/whl/cu118

Verify PyTorch CUDA

python - <<'PY'
import torch
print("PyTorch:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("Device:", torch.cuda.get_device_name(0))
PY

3. TensorFlow (GPU) Installation

TensorFlow ≥2.11 ships a unified wheel that includes GPU support as long as the installed TensorFlow build matches the current driver/CUDA/cuDNN stack. Install a TensorFlow version that is compatible with the host you are bringing up. Example:

pip install tensorflow

Verify TensorFlow CUDA

python - <<'PY'
import tensorflow as tf
print("TensorFlow:", tf.__version__)
gpus = tf.config.list_physical_devices("GPU")
print("GPUs:", gpus)
if not gpus:
    raise SystemExit("No GPU devices detected")
with tf.device("/GPU:0"):
    a = tf.random.normal((2048, 2048))
    b = tf.random.normal((2048, 2048))
    _ = tf.reduce_mean(tf.matmul(a, b)).numpy()
print("GPU matmul ok")
PY

4. Switching Between CPU and GPU Runs

  • CPU-only smoke test: clear CUDA_VISIBLE_DEVICES and run:

    gpumemprof info
    gpumemprof track --duration 10 --interval 0.5 --output track.json --format json
    gpumemprof analyze track.json --format txt --output analysis.txt
    gpumemprof diagnose --duration 0 --output ./diag
    

    On Apple Silicon, clearing CUDA_VISIBLE_DEVICES disables CUDA but gpumemprof info may still report the mps backend. Treat this as a non-CUDA smoke test rather than a strict CPU-only force.

  • GPU path: unset CUDA_VISIBLE_DEVICES (or set it to a GPU index) and run one known-good PyTorch workload plus the TensorFlow GPU matmul check:

    python -m examples.basic.pytorch_demo
    python - <<'PY'
    import tensorflow as tf
    from stormlog.tensorflow import TFMemoryProfiler
    
    profiler = TFMemoryProfiler(device="/GPU:0", enable_tensor_tracking=True)
    with profiler.profile_context("matmul_step"):
        a = tf.random.normal((4096, 4096))
        b = tf.random.normal((4096, 4096))
        c = tf.matmul(a, b)
        _ = tf.reduce_mean(c).numpy()
    
    results = profiler.get_results()
    print(f"Peak memory: {results.peak_memory_mb:.2f} MB")
    print(f"Snapshots captured: {len(results.snapshots)}")
    PY
    

    After this path is clean, you can run python -m examples.basic.tensorflow_demo as the training-backed source-checkout example.

5. Common Issues

Symptom

Fix

torch.cuda.is_available() is False

Confirm NVIDIA driver is installed, retry with the correct CUDA wheel, or reboot after driver install.

TensorFlow sees /GPU:0 but training-backed ops fail

Confirm the GPU matmul check above succeeds first, then align the TensorFlow/CUDA/cuDNN stack before moving to Keras or cuDNN-backed demos.

TensorFlow cannot find cuDNN

Install a TensorFlow build that matches the current CUDA/cuDNN stack and rerun the GPU matmul check before using training-backed examples.

RuntimeError: CUDA driver not found

Check that nvidia-smi works on the command line; reinstall the driver if necessary.

CI path installing wrong framework

Follow .github/workflows/ci.yml logic: install the base deps, then exactly one framework (PyTorch or TensorFlow).

6. Next Steps

Once the GPU stack is working, run the Textual TUI for an interactive check:

pip install "stormlog[tui,torch]"
stormlog

Use the overview tab to confirm the profiler sees your GPUs before running the full benchmarking or release checklist.