Troubleshooting Guide

This guide focuses on the failure modes that show up in the current codebase and workflows.

Installation and entrypoints

`gpumemprof: command not found`

Reinstall the package into the active environment:

pip install -e .
hash -r
gpumemprof --help

`tfmemprof: command not found`

Install the TensorFlow extra:

pip install -e ".[tf]"
hash -r
tfmemprof --help

`stormlog: command not found`

Install the TUI dependencies:

pip install -e ".[tui,torch]"
hash -r
stormlog

Missing dependencies

`ModuleNotFoundError: No module named 'torch'`

Install the PyTorch extra instead of trying to use CUDA-specific profiling without the framework:

pip install "stormlog[torch]"

`ModuleNotFoundError: No module named 'tensorflow'`

pip install "stormlog[tf]"

Visualization export errors

PNG and HTML exports depend on the visualization stack:

pip install "stormlog[viz]"

Runtime mismatches

`GPUMemoryProfiler` fails on a non-CUDA machine

That class is for CUDA-backed PyTorch profiling. On CPU-only or MPS-only systems, use:

gpumemprof monitor
gpumemprof track
CPUMemoryProfiler
CPUMemoryTracker

If you need setup guidance for real CUDA profiling, see the GPU Setup Guide.

TensorFlow CLI is installed but reports no GPU

Start with:

tfmemprof info

If it still reports no GPU devices, treat it as an environment problem first:

TensorFlow build may be CPU-only
device visibility may be restricted
the current host may genuinely be CPU-only

The profiler still supports CPU-backed TensorFlow runs.

TUI issues

Monitoring starts but Visualizations stays empty

The Visualizations tab only renders after timeline samples exist.

Use this sequence:

open Monitoring
click Start Live Tracking
let the workload run long enough to create samples
open Visualizations
click Refresh Timeline

Diagnostics loads but shows no rank data

Check the source you loaded:

live diagnostics require an active tracker session with telemetry events
artifact diagnostics require real JSON, CSV, or diagnose paths
after changing artifact paths, click Refresh

PNG or HTML export appears blank

This usually means there were no timeline samples, not that the export code failed.

Validate in order:

start tracking
confirm the monitoring log is receiving events
refresh the Visualizations tab
export again

The TUI layout looks broken

The app can run in a small terminal, but it is easier to use with a wider window. The deterministic snapshot coverage uses roughly 140x44.

CLI workflow issues

`gpumemprof analyze` rejects `--input`

That is expected. The current PyTorch CLI uses a positional input file:

gpumemprof analyze track.json --format txt --output analysis.txt

`tfmemprof analyze` rejects the positional input style

That is also expected. The current TensorFlow CLI uses --input:

tfmemprof analyze --input tf_monitor.json --detect-leaks --optimize

Diagnose bundle feels too slow

Use --duration 0 when you only need the bundle structure and not a new sampling window:

gpumemprof diagnose --duration 0 --output ./diag_bundle
tfmemprof diagnose --duration 0 --output ./tf_diag

CI and docs issues

Sphinx build fails locally

Install the docs extra and rebuild:

pip install -e ".[docs]"
python3 -m sphinx -W --keep-going -b html docs docs/_build/html

A docs snippet looks suspicious

Use the code and --help output as the source of truth, then run:

python3 -m pytest tests/test_docs_regressions.py -v

Recommended debugging path

If you are unsure where a failure belongs:

verify installation and entrypoints
verify framework availability with info
reproduce with the smallest matching example under examples/
capture telemetry or a diagnose bundle
inspect the result in the TUI or analyzer

If you installed from PyPI and do not have the examples/ package, use the CLI-first validation paths in the Usage Guide, Examples Guide, or CLI Guide instead.

← Back to main docs