Command Line Guide
Stormlog currently exposes three console scripts:
gpumemproftfmemprofjaxmemprofstormlog
Use gpumemprof and tfmemprof for automation. Use stormlog when you want the Textual TUI.
If you want task-oriented operational recipes instead of option-by-option guidance, use the Production Cookbook, especially Always-on Tracking, Incident Playbooks, and Distributed Diagnostics Recipes.
Verify the installed commands
gpumemprof --help
tfmemprof --help
jaxmemprof --help
If you are working from a repository checkout, pip install -e . also exposes
the source-only examples/ package used in a few release-validation flows.
Install and launch the TUI with the current dependency set:
pip install "stormlog[tui,torch]"
stormlog
The stormlog command is also a small dispatcher. Running it without
arguments still launches the TUI, while stormlog query ... runs the local
artifact query CLI without importing Textual.
gpumemprof
The current command groups are:
infomonitortrackanalyzediagnose
Inspect environment
gpumemprof info
gpumemprof info --device 0 --detailed
gpumemprof info still reports the active PyTorch runtime first. When no
supported PyTorch GPU runtime is active, it now falls back to a best-effort host
GPU hardware probe so the command can show detected device names separately from
runtime availability. Supported PyTorch GPU runtimes remain NVIDIA CUDA, AMD
ROCm-backed PyTorch on Linux, and Apple MPS. In that unsupported-runtime mode,
--device is ignored because it only applies to an active PyTorch GPU runtime.
Capture a bounded monitoring window
gpumemprof monitor --duration 30 --interval 0.5 --output monitor.csv --format csv
gpumemprof monitor --duration 30 --interval 0.5 --output monitor.json --format json
Track events over time
gpumemprof track --duration 30 --interval 0.5 --output track.json --format json
gpumemprof track --warning-threshold 75 --critical-threshold 90 --output alerts.csv
gpumemprof track --job-id train-42 --rank 1 --local-rank 1 --world-size 8 --output rank1.json --format json
gpumemprof track --telemetry-sink-dir ./live_sink --telemetry-rollover-mb 32 --telemetry-retention-total-mb 256
Every gpumemprof track run now creates exactly one session identity. The
session begins after tracker startup succeeds and before the first record is
persisted, and it is marked completed only after clean shutdown finalization
finishes.
If your Python workload instruments phases with tracker.phase(...) or
tracker.enter_phase(...), track persists the emitted phase_enter /
phase_exit records alongside the regular telemetry samples. The CLI does not
invent phase records on its own; it only preserves the structured phase events
your workload emitted. Phase records remain optional and do not change the
track CLI surface in v1.
For long-running tracking sessions, Stormlog now degrades gracefully when a collector becomes unhealthy:
the tracked workload keeps running
exported telemetry includes
collector_degraded/collector_recoveredeventsper-event metadata marks partial or unhealthy collector state
retries use bounded exponential backoff instead of crashing the tracker loop
For always-on sessions, gpumemprof track can also stream append-only
telemetry into a sink directory during the run instead of waiting for shutdown.
The sink writes JSONL segments plus a manifest, rolls segments when they hit the
configured size limit, and prunes the oldest closed segments to stay within the
retention budget.
The sink manifest also keeps a session ledger, so multiple runs can safely
share the same sink directory without merging captures. If a previous run was
still running when the process died, the next startup recovers it as
interrupted and starts a fresh session for the new run.
Useful sink options:
--telemetry-sink-dir--telemetry-flush-seconds--telemetry-rollover-mb--telemetry-retention-files--telemetry-retention-total-mb
The always-on qualification harness assumes the default sink settings:
flush every
2.0sroll segments at
64 MBretain at most
8filesretain at most
512 MBtotal
When you inspect track output after a long run, look for these diagnostics in
the JSON or CLI summary:
rollover_countpruned_segment_countpruned_bytesfinal_retained_filesfinal_retained_byteshistory_retained_*history_dropped_*
Optional OOM flight-recorder support:
gpumemprof track \
--oom-flight-recorder \
--oom-dump-dir ./oom_dumps \
--oom-max-dumps 10 \
--oom-max-total-mb 1024 \
--output track.json --format json
Analyze saved telemetry
gpumemprof analyze track.json --format txt --output analysis.txt
gpumemprof analyze track.json --visualization --plot-dir plots
gpumemprof analyze ./live_sink
gpumemprof analyze ./live_sink --session-id 2b30f4a4-7d2d-48f7-a9f6-7d40c14eb95e
gpumemprof analyze uses a positional input file. It can now read a normal JSON
telemetry export, a sink JSONL segment, or a sink directory containing the
current and rolled append-only outputs. If you add --visualization, plots are
written to the directory passed via --plot-dir or to plots/ by default.
When the telemetry stream includes structured phase boundaries, the text summary also includes phase-aware hints such as:
Top gap phase: train / forwardSuspect phase: train / communication
When Stormlog cannot prove a unique phase but can still surface a useful winner, the summary uses a heuristic marker instead of pretending certainty:
Top gap phase: (likely) train / communication
In JSON report payloads, that distinction is preserved as:
canonical
phase_attribution.phase_resolutioncanonical
phase_attribution.phase_sourceoptional
phase_attribution.phase_summaryonly when the displayed winner is heuristic
When multiple sessions are present, gpumemprof analyze selects:
the newest
completedsessionotherwise the newest
interruptedsessionotherwise the newest
incompletesession
Use --session-id to analyze a specific capture instead of the default one.
When phase records are present, gpumemprof analyze also reports the top
phase-attributed gap finding and the top first-cause suspect phase in the text
summary. The JSON report keeps the structured phase_attribution payload next
to:
gap_analysiscollective_attributioncross_rank_analysis.first_cause_suspects
For always-on deployment posture and incident response checklists, continue with Always-on Tracking and Incident Playbooks.
Produce a diagnose bundle
gpumemprof diagnose --duration 5 --interval 0.5 --output ./diag_bundle
gpumemprof diagnose --duration 0 --output ./diag_bundle_quick
gpumemprof diagnose --native-history --duration 0 --output ./diag_bundle_native
Use --duration 0 when you want a fast artifact bundle without a new tracking window.
For task-oriented recipes that combine track, analyze, diagnose, and the
TUI into one workflow, continue with
Incident Playbooks and
Distributed Diagnostics Recipes.
Each standalone diagnose bundle also owns its own session id. The bundle
manifest records whether the run finished completed or was left
incomplete, and synthesized timeline telemetry inherits that same session id
when reloaded later.
--native-history is a CUDA-only debug mode. It records allocator history for
the current gpumemprof diagnose process, then writes native snapshot artifacts
such as cuda_allocator_snapshot.pickle,
cuda_allocator_state_history.html,
cuda_allocator_state_history_annotated.html, and tensor-attribution JSON
alongside the normal diagnose bundle files. The annotated HTML is the
Stormlog-native view that exposes the timeline trace, segment explorer, and
active-memory table in one file. For a maintained workflow example of that
artifact, continue with PyTorch Production Recipes. On
MPS, ROCm, or CPU-only runtimes, the command fails explicitly instead of
pretending support.
stormlog query
stormlog query asks structured questions over local artifact directories. It
uses the same canonical telemetry loaders as gpumemprof analyze, but exposes
rows that are easier to filter, export, and reuse from automation.
The query surface is local-first and file-backed. It reads sink manifests and bundle manifests before loading raw events, so listing sessions or OOM bundles does not require parsing every JSONL segment in a large sink directory.
List sessions:
stormlog query sessions ./live_sink --status interrupted --json
stormlog query sessions ./artifacts --has-oom-bundle --table
Query events:
stormlog query events ./live_sink \
--session-id 2b30f4a4-7d2d-48f7-a9f6-7d40c14eb95e \
--rank 0 \
--event-type collector_degraded \
--limit 50
List OOM bundles:
stormlog query ooms ./artifacts --backend cuda --table
stormlog query ooms ./artifacts --created-after 2026-05-12T00:00:00Z --json
List grouped recurring issues:
stormlog query issues ./live_sink ./oom_dumps --kind oom --json
stormlog query issues ./artifacts --severity warning --session-id session-123
Run built-in summaries:
stormlog query summary ./live_sink \
--metric peak_allocator_reserved_bytes \
--group-by session
stormlog query summary ./live_sink \
--metric hidden_memory_gap_growth \
--group-by session-rank
Supported output formats:
--table: readable table output, used by default--json: machine-readable rows--csv: row-query exports forsessions,events, andooms
The Python API behind the CLI is available as:
import stormlog.query
store = stormlog.query.open(["./live_sink", "./oom_dumps"])
sessions = store.list_sessions()
events = store.query_events()
ooms = store.list_oom_bundles()
issues = store.list_issues()
For engine-choice details and follow-on work, see Local Query Layer. For issue grouping rules and schema details, see Durable Issue Fingerprinting.
tfmemprof
The current command groups are:
infomonitortrackanalyzediagnose
Inspect environment
tfmemprof info
Monitor TensorFlow memory usage
tfmemprof monitor --interval 0.5 --duration 30 --output tf_monitor.json
tfmemprof monitor --interval 0.5 --duration 30 --threshold 4096 --device /GPU:0 --output tf_monitor_threshold.json
For CPU-only TensorFlow or when the GPU backend is unavailable, use
--device /CPU:0:
tfmemprof monitor --interval 0.5 --duration 30 --device /CPU:0 --output tf_monitor.json
tfmemprof track --interval 0.5 --threshold 4096 --device /CPU:0 --output tf_track.json
Track TensorFlow memory usage
tfmemprof track --interval 0.5 --threshold 4096 --output tf_track.json
tfmemprof track --interval 0.5 --threshold 4096 --job-id train-42 --rank 3 --local-rank 1 --world-size 8 --output tf_rank3.json
tfmemprof track --interval 0.5 --threshold 4096 --output tf_track.json --telemetry-sink-dir ./tf_live_sink
tfmemprof track follows the same degraded-mode rules as the PyTorch tracker:
collector failures pause new sample emission, status events remain visible in the
artifact stream, and normal sampling resumes automatically after recovery.
The same append-only sink options are available when you need bounded,
interrupt-tolerant TensorFlow telemetry during a long-running session.
TensorFlow tracking also keeps only a bounded recent history in memory. The current CLI output and JSON exports surface the retained vs dropped sample, event, and alert counts so long-running jobs can distinguish expected eviction from a silent memory-growth regression.
The same session rules apply to TensorFlow tracking:
one session id per
tfmemprof trackrunsink recovery marks old running sessions as
interruptedloaders and diagnostics separate same-host runs by
session_id, not by job or rank alone
Like the PyTorch and CPU trackers, TensorFlow tracking also preserves optional structured phase companion records when you instrument the tracker through the Python API.
Analyze TensorFlow results
tfmemprof analyze --input tf_monitor.json --detect-leaks --optimize
tfmemprof analyze --input tf_monitor.json --detect-leaks --optimize --visualize --report tf_report.txt
Unlike gpumemprof analyze, the TensorFlow analyzer uses --input.
Produce a diagnose bundle
tfmemprof diagnose --duration 5 --interval 0.5 --output ./tf_diag
tfmemprof diagnose --duration 0 --output ./tf_diag_quick
jaxmemprof
The current command groups are:
infomonitortrackanalyzediagnose
Inspect environment
jaxmemprof info
Monitor JAX memory usage
jaxmemprof monitor --interval 0.5 --duration 30 --output jax_monitor.json
jaxmemprof monitor --interval 0.5 --duration 30 --device gpu --output jax_monitor_gpu.json
For CPU-only JAX execution or when accelerators are unavailable, use --device cpu:
jaxmemprof monitor --interval 0.5 --duration 30 --device cpu --output jax_monitor.json
jaxmemprof track --interval 0.5 --device cpu --output jax_track.json
Track JAX memory usage
jaxmemprof track --interval 0.5 --output jax_track.json
jaxmemprof track --interval 0.5 --job-id train-42 --rank 2 --local-rank 0 --world-size 8 --output jax_rank2.json
jaxmemprof track --interval 0.5 --output jax_track.json --telemetry-sink-dir ./jax_live_sink
jaxmemprof track shares the same robust, degraded-mode semantics as the PyTorch and TensorFlow trackers, allowing it to gracefully handle long-running runs, collector interruptions, and append-only sink persistence.
JAX tracking also logs structured phase boundaries when using jaxmemprof.MemoryTracker instrumentation and emits telemetry streams that are compatible with gpumemprof analyze and the Textual TUI.
Analyze JAX results
jaxmemprof analyze --input jax_monitor.json --detect-leaks --optimize
jaxmemprof analyze --input jax_monitor.json --detect-leaks --optimize --visualize --report jax_report.txt
Produce a diagnose bundle
jaxmemprof diagnose --duration 5 --interval 0.5 --output ./jax_diag
jaxmemprof diagnose --duration 0 --output ./jax_diag_quick
TUI launch
pip install "stormlog[tui,torch]"
stormlog
Inside the TUI, the CLI & Actions tab exposes quick actions for:
gpumemprof infogpumemprof monitortfmemprof monitorjaxmemprof monitorgpumemprof diagnosesample workloads
OOM scenario runner
capability matrix smoke run
Release-validation shortcuts
Source checkout only. These commands require the examples/ package from a
repository clone:
python -m examples.cli.quickstart
python -m examples.cli.capability_matrix --mode smoke --target both --oom-mode simulated
Pip users should use this CLI-only sequence instead:
gpumemprof info
gpumemprof track --duration 2 --interval 0.5 --output track.json --format json
gpumemprof analyze track.json --format txt --output analysis.txt
gpumemprof diagnose --duration 0 --output ./diag
tfmemprof info
tfmemprof diagnose --duration 0 --output ./tf_diag
jaxmemprof info
jaxmemprof diagnose --duration 0 --output ./jax_diag
Choosing the right command
Use monitor when
you want a bounded sample window
you only need a simple CSV or JSON output
Use track when
you want event streams and alert thresholds
you want later exports or distributed identity fields
you want a reconstructable session that owns its sink, diagnose, or OOM artifacts
Use analyze when
you already have saved telemetry
you want a report or plot output
Use diagnose when
you need a portable artifact bundle to archive or share
you plan to inspect the output later in the TUI diagnostics flow