Troubleshooting Guide
This guide focuses on the failure modes that show up in the current codebase and workflows.
Installation and entrypoints
gpumemprof: command not found
Reinstall the package into the active environment:
pip install -e .
hash -r
gpumemprof --help
tfmemprof: command not found
Install the TensorFlow extra:
pip install -e ".[tf]"
hash -r
tfmemprof --help
stormlog: command not found
Install the TUI dependencies:
pip install -e ".[tui,torch]"
hash -r
stormlog
Missing dependencies
ModuleNotFoundError: No module named 'torch'
Install the PyTorch extra instead of trying to use CUDA-specific profiling without the framework:
pip install "stormlog[torch]"
ModuleNotFoundError: No module named 'tensorflow'
pip install "stormlog[tf]"
Visualization export errors
PNG and HTML exports depend on the visualization stack:
pip install "stormlog[viz]"
Runtime mismatches
GPUMemoryProfiler fails on a non-CUDA machine
That class is for CUDA-backed PyTorch profiling. On CPU-only or MPS-only systems, use:
gpumemprof monitorgpumemprof trackCPUMemoryProfilerCPUMemoryTracker
If you need setup guidance for real CUDA profiling, see the GPU Setup Guide.
TensorFlow CLI is installed but reports no GPU
Start with:
tfmemprof info
If it still reports no GPU devices, treat it as an environment problem first:
TensorFlow build may be CPU-only
device visibility may be restricted
the current host may genuinely be CPU-only
The profiler still supports CPU-backed TensorFlow runs.
TUI issues
Monitoring starts but Visualizations stays empty
The Visualizations tab only renders after timeline samples exist.
Use this sequence:
open
Monitoringclick
Start Live Trackinglet the workload run long enough to create samples
open
Visualizationsclick
Refresh Timeline
Diagnostics loads but shows no rank data
Check the source you loaded:
live diagnostics require an active tracker session with telemetry events
artifact diagnostics require real JSON, CSV, or diagnose paths
after changing artifact paths, click
Refresh
PNG or HTML export appears blank
This usually means there were no timeline samples, not that the export code failed.
Validate in order:
start tracking
confirm the monitoring log is receiving events
refresh the Visualizations tab
export again
The TUI layout looks broken
The app can run in a small terminal, but it is easier to use with a wider window. The deterministic snapshot coverage uses roughly 140x44.
CLI workflow issues
gpumemprof analyze rejects --input
That is expected. The current PyTorch CLI uses a positional input file:
gpumemprof analyze track.json --format txt --output analysis.txt
tfmemprof analyze rejects the positional input style
That is also expected. The current TensorFlow CLI uses --input:
tfmemprof analyze --input tf_monitor.json --detect-leaks --optimize
Diagnose bundle feels too slow
Use --duration 0 when you only need the bundle structure and not a new sampling window:
gpumemprof diagnose --duration 0 --output ./diag_bundle
tfmemprof diagnose --duration 0 --output ./tf_diag
CI and docs issues
Sphinx build fails locally
Install the docs extra and rebuild:
pip install -e ".[docs]"
python3 -m sphinx -W --keep-going -b html docs docs/_build/html
A docs snippet looks suspicious
Use the code and --help output as the source of truth, then run:
python3 -m pytest tests/test_docs_regressions.py -v
Recommended debugging path
If you are unsure where a failure belongs:
verify installation and entrypoints
verify framework availability with
inforeproduce with the smallest matching example under
examples/capture telemetry or a diagnose bundle
inspect the result in the TUI or analyzer
If you installed from PyPI and do not have the examples/ package, use the
CLI-first validation paths in the Usage Guide, Examples Guide,
or CLI Guide instead.