GPU Stack Installation & CUDA Enablement

Use this guide to install the GPU-capable versions of PyTorch and TensorFlow, verify CUDA availability, and switch between CPU and GPU smoke tests.

1. Prerequisites

NVIDIA driver that supports the CUDA version you plan to use. Update through NVIDIA Experience (Windows) or your distro’s driver manager (Linux).
CUDA toolkit (optional). Most pip wheels bundle the needed runtime, but native builds may require the toolkit/nvcc.
Python 3.10–3.12 in a virtual environment.

2. PyTorch (CUDA) Installation

Pick the CUDA build that matches your driver/toolkit. Example for CUDA 12.1:

# Windows / macOS / Linux (same pip command)
pip install torch==2.2.2 --index-url https://download.pytorch.org/whl/cu121

Need CUDA 11.8 instead? Swap the wheel URL:

pip install torch==2.2.2 --index-url https://download.pytorch.org/whl/cu118

Verify PyTorch CUDA

python - <<'PY'
import torch
print("PyTorch:", torch.__version__)
print("CUDA available:", torch.cuda.is_available())
if torch.cuda.is_available():
    print("Device:", torch.cuda.get_device_name(0))
PY

3. TensorFlow (GPU) Installation

TensorFlow ≥2.11 ships a unified wheel that includes GPU support as long as the installed TensorFlow build matches the current driver/CUDA/cuDNN stack. Install a TensorFlow version that is compatible with the host you are bringing up. Example:

pip install tensorflow

Verify TensorFlow CUDA

python - <<'PY'
import tensorflow as tf
print("TensorFlow:", tf.__version__)
gpus = tf.config.list_physical_devices("GPU")
print("GPUs:", gpus)
if not gpus:
    raise SystemExit("No GPU devices detected")
with tf.device("/GPU:0"):
    a = tf.random.normal((2048, 2048))
    b = tf.random.normal((2048, 2048))
    _ = tf.reduce_mean(tf.matmul(a, b)).numpy()
print("GPU matmul ok")
PY

4. Switching Between CPU and GPU Runs

CPU-only smoke test: clear CUDA_VISIBLE_DEVICES and run:
```
gpumemprof info
gpumemprof track --duration 10 --interval 0.5 --output track.json --format json
gpumemprof analyze track.json --format txt --output analysis.txt
gpumemprof diagnose --duration 0 --output ./diag
```
On Apple Silicon, clearing CUDA_VISIBLE_DEVICES disables CUDA but gpumemprof info may still report the mps backend. Treat this as a non-CUDA smoke test rather than a strict CPU-only force.

GPU path: unset CUDA_VISIBLE_DEVICES (or set it to a GPU index) and run one known-good PyTorch workload plus the TensorFlow GPU matmul check:

python -m examples.basic.pytorch_demo
python - <<'PY'
import tensorflow as tf
from stormlog.tensorflow import TFMemoryProfiler

profiler = TFMemoryProfiler(device="/GPU:0", enable_tensor_tracking=True)
with profiler.profile_context("matmul_step"):
    a = tf.random.normal((4096, 4096))
    b = tf.random.normal((4096, 4096))
    c = tf.matmul(a, b)
    _ = tf.reduce_mean(c).numpy()

results = profiler.get_results()
print(f"Peak memory: {results.peak_memory_mb:.2f} MB")
print(f"Snapshots captured: {len(results.snapshots)}")
PY

After this path is clean, you can run python -m examples.basic.tensorflow_demo as the training-backed source-checkout example.

5. Common Issues

Symptom	Fix
`torch.cuda.is_available()` is False	Confirm NVIDIA driver is installed, retry with the correct CUDA wheel, or reboot after driver install.
TensorFlow sees `/GPU:0` but training-backed ops fail	Confirm the GPU matmul check above succeeds first, then align the TensorFlow/CUDA/cuDNN stack before moving to Keras or cuDNN-backed demos.
TensorFlow cannot find cuDNN	Install a TensorFlow build that matches the current CUDA/cuDNN stack and rerun the GPU matmul check before using training-backed examples.
`RuntimeError: CUDA driver not found`	Check that `nvidia-smi` works on the command line; reinstall the driver if necessary.
CI path installing wrong framework	Follow `.github/workflows/ci.yml` logic: install the base deps, then exactly one framework (PyTorch or TensorFlow).

6. Next Steps

Once the GPU stack is working, run the Textual TUI for an interactive check:

pip install "stormlog[tui,torch]"
stormlog

Use the overview tab to confirm the profiler sees your GPUs before running the full benchmarking or release checklist.