← Back to main docs

Examples Guide

This page maps documented workflows to the example modules that are actually maintained in the repo.

Source checkout only below. The example modules on this page live under examples/ and are not included in the PyPI distribution. If you installed with pip install stormlog, use the CLI-only validation below and the Python snippets in the Usage Guide instead.

CLI-only validation for pip users

Use this when you installed from PyPI and do not have the examples/ package:

gpumemprof info
gpumemprof track --duration 2 --interval 0.5 --output track.json --format json
gpumemprof analyze track.json --format txt --output analysis.txt
gpumemprof diagnose --duration 0 --output ./diag

tfmemprof info
tfmemprof diagnose --duration 0 --output ./tf_diag

This validates the installed CLI and produces artifacts you can load in the TUI Diagnostics tab.

Start here

CLI smoke and environment validation

python -m examples.cli.quickstart

Use this when you want a fast signal that the installed console scripts work.

Release-style capability sweep

python -m examples.cli.capability_matrix --mode smoke --target both --oom-mode simulated

Use this when you want one command that touches the major launch-validation flows.

Python API examples

PyTorch

python -m examples.basic.pytorch_demo

This demo shows:

  • CUDA-gated GPUMemoryProfiler usage

  • profile_function

  • profile_context

  • summary reporting

CUDA allocator-history and attributed HTML

python -m examples.basic.cuda_native_history_demo --output ./diag_bundle_native_demo

This demo shows:

  • stormlog.cuda_native_debug.cuda_memory_history

  • capture_cuda_snapshot_artifacts

  • the annotated cuda_allocator_state_history_annotated.html artifact

  • a retained-allocation snapshot that populates the timeline, segment explorer, and active-memory table

TensorFlow

python -m examples.basic.tensorflow_demo

This demo shows:

  • TFMemoryProfiler

  • context profiling

  • TensorFlow result summaries

  • snapshot-driven reporting

This example exercises TensorFlow’s training-backed path. When you are bringing up a new GPU stack, start with the workload-backed /GPU:0 matmul recipe in TensorFlow Production Recipes before using this demo as a deeper source-checkout example.

Advanced tracking

python -m examples.advanced.tracking_demo

This demo shows:

  • MemoryTracker

  • alert callbacks

  • watchdog cleanup flow

  • exported CSV and JSON tracker events

Structured phase tracking

python -m examples.advanced.phase_tracking_demo

This demo shows:

  • tracker-scoped phase(...) context managers

  • nested phase boundaries with structured metadata

  • exported phase_enter / phase_exit records

  • phase-aware telemetry you can reload in gpumemprof analyze

Scenario modules

These are the closest examples to real operational workflows:

python -m examples.scenarios.cpu_telemetry_scenario
python -m examples.scenarios.mps_telemetry_scenario
python -m examples.scenarios.oom_flight_recorder_scenario --mode simulated
python -m examples.scenarios.tf_end_to_end_scenario
python -m examples.scenarios.wandb_training_smoke --device cuda --wandb-mode offline
python -m torch.distributed.run --nnodes=1 --nproc_per_node=2 -m examples.scenarios.torchrun_ddp_reference

When to use them

  • cpu_telemetry_scenario: validate CPU-only telemetry export

  • mps_telemetry_scenario: validate Apple Silicon / MPS flows

  • oom_flight_recorder_scenario: rehearse OOM artifact capture safely

  • tf_end_to_end_scenario: validate TensorFlow monitor, track, analyze, and diagnose flow together

  • wandb_training_smoke: run a short real PyTorch training loop that writes a summary bundle, an append-only sink, offline W&B files, and structured phase boundaries you can reload in gpumemprof analyze and the TUI

  • torchrun_ddp_reference: run a reference single-node DDP training job derived from the official PyTorch torchrun tutorial pattern, with one telemetry sink per rank and a shared distributed summary

Daily workflow mapping

ML engineer

Run:

python -m examples.basic.pytorch_demo

For TensorFlow, start with the /GPU:0 matmul recipe in TensorFlow Production Recipes when you are bringing up a GPU runtime, then run python -m examples.basic.tensorflow_demo once the training-backed path is aligned.

Then move to the Usage Guide if you want the same patterns embedded inside your own code.

Researcher or debugger

Run:

python -m examples.advanced.tracking_demo
python -m examples.scenarios.oom_flight_recorder_scenario --mode simulated

Then move to the TUI Guide and Troubleshooting Guide.

CI or release owner

Run:

python -m examples.cli.quickstart
python -m examples.cli.capability_matrix --mode smoke --target both --oom-mode simulated

Then move to the Testing and Validation Guide for the current CI mapping.

Markdown-only test guides

The old executable guides were replaced by Markdown checklists:

Those Markdown guides are also source-checkout only. Pip users should follow the CLI-only validation above and the Python API snippets in the Usage Guide.

Notes

  • Example modules are preferred over large inline doc snippets whenever a maintained script already exists.

  • Some examples are environment-gated. For example, examples.basic.pytorch_demo skips itself when CUDA is unavailable.


← Back to main docs