stormlog

Stormlog - A comprehensive memory profiling tool.

class stormlog.GPUMemoryProfiler(device=None, track_tensors=True, track_cpu_memory=True, collect_stack_traces=False)[source]

Bases: object

Comprehensive GPU memory profiler for PyTorch operations.

Parameters:
  • device (str | int | torch.device | None)

  • track_tensors (bool)

  • track_cpu_memory (bool)

  • collect_stack_traces (bool)

profile_function(func, *args, **kwargs)[source]

Profile a single function call.

Parameters:
  • func (Callable[[...], Any]) – Function to profile

  • *args (Any) – Arguments to pass to function

  • **kwargs (Any) – Keyword arguments to pass to function

Returns:

ProfileResult with profiling information

Return type:

ProfileResult

profile_context(name='context')[source]

Context manager for profiling a block of code.

Parameters:

name (str) – Name for the profiled context

Yields:

ProfileResult after the context exits

Return type:

Any

start_monitoring(interval=0.1)[source]

Start continuous memory monitoring.

Parameters:

interval (float) – Monitoring interval in seconds

Return type:

None

stop_monitoring()[source]

Stop continuous memory monitoring.

Return type:

None

get_summary()[source]

Get a summary of all profiling results.

Return type:

Dict[str, Any]

clear_results()[source]

Clear all profiling results and reset state.

Return type:

None

class stormlog.MemorySnapshot(timestamp, allocated_memory, reserved_memory, max_memory_allocated, max_memory_reserved, active_memory, inactive_memory, cpu_memory, device_id=0, operation=None, stack_trace=None)[source]

Bases: object

Represents a memory snapshot at a specific point in time.

Parameters:
  • timestamp (float)

  • allocated_memory (int)

  • reserved_memory (int)

  • max_memory_allocated (int)

  • max_memory_reserved (int)

  • active_memory (int)

  • inactive_memory (int)

  • cpu_memory (int)

  • device_id (int)

  • operation (str | None)

  • stack_trace (str | None)

timestamp: float
allocated_memory: int
reserved_memory: int
max_memory_allocated: int
max_memory_reserved: int
active_memory: int
inactive_memory: int
cpu_memory: int
device_id: int = 0
operation: str | None = None
stack_trace: str | None = None
to_dict()[source]

Convert snapshot to dictionary.

Return type:

Dict[str, Any]

class stormlog.ProfileResult(function_name, execution_time, memory_before, memory_after, memory_peak, memory_allocated, memory_freed, tensors_created, tensors_deleted, call_count=1)[source]

Bases: object

Results from profiling a function or operation.

Parameters:
  • function_name (str)

  • execution_time (float)

  • memory_before (MemorySnapshot)

  • memory_after (MemorySnapshot)

  • memory_peak (MemorySnapshot)

  • memory_allocated (int)

  • memory_freed (int)

  • tensors_created (int)

  • tensors_deleted (int)

  • call_count (int)

function_name: str
execution_time: float
memory_before: MemorySnapshot
memory_after: MemorySnapshot
memory_peak: MemorySnapshot
memory_allocated: int
memory_freed: int
tensors_created: int
tensors_deleted: int
call_count: int = 1
memory_diff()[source]

Calculate memory difference between before and after.

Return type:

int

peak_memory_usage()[source]

Get peak memory usage during execution.

Return type:

int

to_dict()[source]

Convert result to dictionary.

Return type:

Dict[str, Any]

stormlog.profile_context(name='context', device=None, profiler=None)[source]

Context manager for profiling a block of code.

Parameters:
  • name (str) – Name for the profiled context

  • device (str | int | torch.device | None) – GPU device to use for profiling

  • profiler (GPUMemoryProfiler | None) – Custom profiler instance to use

Yields:

ProfileResult after the context exits

Return type:

Iterator[GPUMemoryProfiler]

Example

with profile_context(“model_forward”) as prof:

output = model(input)

stormlog.profile_function(func=None, *, name=None, device=None, profiler=None)[source]

Decorator to profile a function’s GPU memory usage.

Can be used as @profile_function or @profile_function(name=”custom_name”)

Parameters:
  • func (F | None) – Function to profile (when used as @profile_function)

  • name (str | None) – Custom name for the profiled function

  • device (str | int | torch.device | None) – GPU device to use for profiling

  • profiler (GPUMemoryProfiler | None) – Custom profiler instance to use

Returns:

Decorated function or ProfileResult if called directly

Return type:

Callable[[F], F] | F

class stormlog.MemoryVisualizer(profiler=None)[source]

Bases: object

Comprehensive visualization tool for memory profiling data.

Parameters:

profiler (GPUMemoryProfiler | None)

plot_memory_timeline(results=None, snapshots=None, save_path=None, interactive=True)[source]

Plot memory usage over time.

Parameters:
  • results (List[ProfileResult] | None) – List of ProfileResults to plot

  • snapshots (List[MemorySnapshot] | None) – List of MemorySnapshots to plot

  • save_path (str | None) – Path to save the plot

  • interactive (bool) – Whether to create interactive plot

Returns:

Matplotlib or Plotly figure

Return type:

matplotlib.pyplot.Figure | plotly.graph_objects.Figure

plot_cross_rank_timeline(events, save_path=None)[source]

Plot a merged, aligned cross-rank device-memory timeline.

Parameters:
Return type:

matplotlib.pyplot.Figure

plot_function_comparison(results=None, metric='memory_allocated', save_path=None, interactive=True)[source]

Compare memory usage across different functions.

Parameters:
  • results (List[ProfileResult] | None) – List of ProfileResults to compare

  • metric (str) – Metric to compare (‘memory_allocated’, ‘execution_time’, ‘peak_memory’)

  • save_path (str | None) – Path to save the plot

  • interactive (bool) – Whether to create interactive plot

Returns:

Matplotlib or Plotly figure

Return type:

matplotlib.pyplot.Figure | plotly.graph_objects.Figure

plot_memory_heatmap(results=None, save_path=None)[source]

Create a heatmap showing memory usage patterns.

Parameters:
  • results (List[ProfileResult] | None) – List of ProfileResults to analyze

  • save_path (str | None) – Path to save the plot

Returns:

Matplotlib figure

Return type:

matplotlib.pyplot.Figure

create_dashboard(results=None, snapshots=None, save_path=None)[source]

Create a comprehensive dashboard with multiple visualizations.

Parameters:
  • results (List[ProfileResult] | None) – List of ProfileResults

  • snapshots (List[MemorySnapshot] | None) – List of MemorySnapshots

  • save_path (str | None) – Path to save the dashboard

Returns:

Plotly figure with subplots

Return type:

plotly.graph_objects.Figure

export_data(results=None, snapshots=None, format='csv', save_path='memory_profile_data')[source]

Export profiling data to various formats.

Parameters:
  • results (List[ProfileResult] | None) – List of ProfileResults to export

  • snapshots (List[MemorySnapshot] | None) – List of MemorySnapshots to export

  • format (str) – Export format (‘csv’, ‘json’, ‘excel’)

  • save_path (str) – Base path for saved files

Returns:

Path to saved file

Return type:

str

show(fig)[source]

Display a figure.

Parameters:

fig (matplotlib.pyplot.Figure | plotly.graph_objects.Figure)

Return type:

None

class stormlog.MemoryAnalyzer(profiler=None, collective_sensitivity='medium', collective_threshold_overrides=None)[source]

Bases: object

Advanced analyzer for memory profiling data.

Parameters:
  • profiler (GPUMemoryProfiler | None)

  • collective_sensitivity (str)

  • collective_threshold_overrides (Mapping[str, Any] | None)

analyze_memory_patterns(results=None)[source]

Detect memory usage patterns in profiling data.

Parameters:

results (List[ProfileResult] | None) – List of ProfileResults to analyze

Returns:

List of detected patterns

Return type:

List[MemoryPattern]

generate_performance_insights(results=None)[source]

Generate performance insights from profiling data.

Parameters:

results (List[ProfileResult] | None) – List of ProfileResults to analyze

Returns:

List of performance insights

Return type:

List[PerformanceInsight]

analyze_memory_gaps(events, *, phase_resolver=None)[source]

Classify allocator-vs-device hidden memory gaps over time.

Parameters:
Returns:

Prioritized list of gap findings (severity desc, confidence desc).

Return type:

List[GapFinding]

analyze_cross_rank_timeline(events, *, phase_resolver=None)[source]

Merge rank timelines and detect the earliest cluster-wide spike cause.

Parameters:
Return type:

Dict[str, Any]

analyze_collective_attribution(events, *, phase_resolver=None)[source]

Attribute hidden-memory spikes to collective communication phases.

Parameters:
Return type:

List[CollectiveAttributionResult]

generate_optimization_report(results=None, events=None)[source]

Generate a comprehensive optimization report.

Parameters:
  • results (List[ProfileResult] | None) – List of ProfileResults to analyze

  • events (List[TelemetryEventV2] | None) – Optional telemetry event series for gap analysis. When provided, the report includes a gap_analysis section.

Returns:

Comprehensive optimization report

Return type:

Dict[str, Any]

class stormlog.GapFinding(classification, severity, confidence, evidence, description, remediation, evidence_timestamp_ns=None, phase_attribution=None)[source]

Bases: object

A classified finding from hidden-memory gap analysis.

Parameters:
  • classification (str)

  • severity (str)

  • confidence (float)

  • evidence (dict[str, Any])

  • description (str)

  • remediation (List[str])

  • evidence_timestamp_ns (int | None)

  • phase_attribution (PhaseAttribution | None)

classification: str
severity: str
confidence: float
evidence: dict[str, Any]
description: str
remediation: List[str]
evidence_timestamp_ns: int | None = None
phase_attribution: PhaseAttribution | None = None
class stormlog.MemoryTracker(device=None, sampling_interval=0.1, max_events=10000, enable_alerts=True, enable_oom_flight_recorder=False, oom_dump_dir='oom_dumps', oom_buffer_size=None, oom_max_dumps=5, oom_max_total_mb=256, job_id=None, rank=None, local_rank=None, world_size=None, enable_native_cuda_history=False, native_history_max_entries=100000, telemetry_sink_config=None)[source]

Bases: object

Real-time memory tracker with alerts and monitoring.

Parameters:
  • device (str | int | torch.device | None)

  • sampling_interval (float)

  • max_events (int)

  • enable_alerts (bool)

  • enable_oom_flight_recorder (bool)

  • oom_dump_dir (str)

  • oom_buffer_size (int | None)

  • oom_max_dumps (int)

  • oom_max_total_mb (int)

  • job_id (str | None)

  • rank (int | None)

  • local_rank (int | None)

  • world_size (int | None)

  • enable_native_cuda_history (bool)

  • native_history_max_entries (int)

  • telemetry_sink_config (TelemetrySinkConfig | None)

get_session_summary()[source]

Return the current or most recent tracking session summary.

Return type:

SessionSummary | None

property oom_buffer_size: int

Resolved OOM ring-buffer size.

start_tracking()[source]

Start real-time memory tracking.

Return type:

None

stop_tracking()[source]

Stop real-time memory tracking.

Return type:

None

enter_phase(name, *, metadata=None)[source]

Enter one structured workload phase while tracking is active.

Parameters:
  • name (str)

  • metadata (Dict[str, Any] | None)

Return type:

PhaseHandle

phase(name, *, metadata=None)[source]

Context manager that emits structured phase enter and exit records.

Parameters:
  • name (str)

  • metadata (Dict[str, Any] | None)

Return type:

Any

handle_exception(exc, context=None, metadata=None)[source]

Capture OOM diagnostics for recognized OOM exceptions.

Parameters:
  • exc (BaseException)

  • context (str | None)

  • metadata (Dict[str, Any] | None)

Return type:

str | None

capture_oom(context='runtime', metadata=None)[source]

Capture OOM diagnostic bundle if a tracked block raises OOM.

Parameters:
  • context (str)

  • metadata (Dict[str, Any] | None)

Return type:

Any

add_alert_callback(callback)[source]

Add a callback function to be called on alerts.

Parameters:

callback (Callable[[TrackingEvent], None])

Return type:

None

remove_alert_callback(callback)[source]

Remove an alert callback.

Parameters:

callback (Callable[[TrackingEvent], None])

Return type:

None

get_events(event_type=None, last_n=None, since=None)[source]

Get tracking events with optional filtering.

Parameters:
  • event_type (str | None) – Filter by event type

  • last_n (int | None) – Get last N events

  • since (float | None) – Get events since timestamp

Returns:

List of filtered events

Return type:

List[TrackingEvent]

get_memory_timeline(interval=1.0)[source]

Get memory usage timeline with specified interval.

Parameters:

interval (float) – Time interval in seconds for aggregation

Returns:

Dictionary with timeline data

Return type:

Dict[str, List]

get_statistics()[source]

Get comprehensive tracking statistics.

Return type:

Dict[str, Any]

export_events(filename, format='csv')[source]

Export tracking events to file.

Parameters:
  • filename (str) – Output filename

  • format (str) – Export format (‘csv’ or ‘json’)

Return type:

None

clear_events()[source]

Clear all tracking events.

Return type:

None

set_threshold(threshold_name, value)[source]

Set alert threshold.

Parameters:
  • threshold_name (str) – Name of the threshold

  • value (int | float) – Threshold value

Return type:

None

get_alerts(last_n=None)[source]

Get all alert events (warnings, critical, errors).

Parameters:

last_n (int | None)

Return type:

List[TrackingEvent]

class stormlog.OOMFlightRecorder(config)[source]

Bases: object

Bounded recorder that writes dump bundles on OOM.

Parameters:

config (OOMFlightRecorderConfig)

record_event(event)[source]

Append one event payload to the in-memory ring buffer.

Parameters:

event (dict[str, Any])

Return type:

None

snapshot_events()[source]

Return buffered events in chronological order.

Return type:

list[dict[str, Any]]

clear()[source]

Discard buffered events for the next session/run.

Return type:

None

dump(*, reason, exception, context, backend, metadata=None, session_summary=None)[source]

Write an OOM diagnostic bundle and enforce retention constraints.

Parameters:
  • reason (str)

  • exception (BaseException)

  • context (str | None)

  • backend (str)

  • metadata (dict[str, Any] | None)

  • session_summary (SessionSummary | None)

Return type:

str | None

class stormlog.OOMFlightRecorderConfig(enabled=False, dump_dir='oom_dumps', buffer_size=10000, max_dumps=5, max_total_mb=256)[source]

Bases: object

Runtime configuration for OOM flight recorder dumps.

Parameters:
  • enabled (bool)

  • dump_dir (str)

  • buffer_size (int)

  • max_dumps (int)

  • max_total_mb (int)

enabled: bool = False
dump_dir: str = 'oom_dumps'
buffer_size: int = 10000
max_dumps: int = 5
max_total_mb: int = 256
class stormlog.OOMExceptionClassification(is_oom, reason)[source]

Bases: object

Normalized classification result for an exception.

Parameters:
  • is_oom (bool)

  • reason (str | None)

is_oom: bool
reason: str | None
stormlog.classify_oom_exception(exc)[source]

Classify whether an exception corresponds to an OOM condition.

Parameters:

exc (BaseException)

Return type:

OOMExceptionClassification

class stormlog.TelemetryEventV2(schema_version, timestamp_ns, event_type, collector, sampling_interval_ms, pid, host, device_id, allocator_allocated_bytes, allocator_reserved_bytes, allocator_active_bytes, allocator_inactive_bytes, allocator_change_bytes, device_used_bytes, device_free_bytes, device_total_bytes, context, job_id=None, rank=0, local_rank=0, world_size=1, metadata=<factory>)[source]

Bases: object

Legacy v2 telemetry event payload retained for backward-compatible writes/tests.

Parameters:
  • schema_version (Literal[2])

  • timestamp_ns (int)

  • event_type (str)

  • collector (str)

  • sampling_interval_ms (int)

  • pid (int)

  • host (str)

  • device_id (int)

  • allocator_allocated_bytes (int)

  • allocator_reserved_bytes (int)

  • allocator_active_bytes (int | None)

  • allocator_inactive_bytes (int | None)

  • allocator_change_bytes (int)

  • device_used_bytes (int)

  • device_free_bytes (int | None)

  • device_total_bytes (int | None)

  • context (str | None)

  • job_id (str | None)

  • rank (int)

  • local_rank (int)

  • world_size (int)

  • metadata (dict[str, Any])

schema_version: Literal[2]
timestamp_ns: int
event_type: str
collector: str
sampling_interval_ms: int
pid: int
host: str
device_id: int
allocator_allocated_bytes: int
allocator_reserved_bytes: int
allocator_active_bytes: int | None
allocator_inactive_bytes: int | None
allocator_change_bytes: int
device_used_bytes: int
device_free_bytes: int | None
device_total_bytes: int | None
context: str | None
job_id: str | None = None
rank: int = 0
local_rank: int = 0
world_size: int = 1
metadata: dict[str, Any]
class stormlog.DeviceMemoryCollector[source]

Bases: ABC

Backend-specific collector contract for device memory signals.

abstract name()[source]

Return runtime backend name (cuda, rocm, mps).

Return type:

str

abstract is_available()[source]

Return whether this collector can sample in the current runtime.

Return type:

bool

abstract sample()[source]

Collect a single normalized memory sample.

Return type:

DeviceMemorySample

sample_with_diagnostics()[source]

Collect a sample while preserving core-failure diagnostics.

Return type:

DeviceMemorySampleResult

abstract capabilities()[source]

Describe backend capability signals for telemetry metadata.

Return type:

Dict[str, Any]

class stormlog.DeviceMemorySample(allocated_bytes, reserved_bytes, used_bytes, free_bytes, total_bytes, active_bytes, inactive_bytes, device_id)[source]

Bases: object

Normalized device-memory sample produced by a backend collector.

Parameters:
  • allocated_bytes (int)

  • reserved_bytes (int)

  • used_bytes (int)

  • free_bytes (int | None)

  • total_bytes (int | None)

  • active_bytes (int | None)

  • inactive_bytes (int | None)

  • device_id (int)

allocated_bytes: int
reserved_bytes: int
used_bytes: int
free_bytes: int | None
total_bytes: int | None
active_bytes: int | None
inactive_bytes: int | None
device_id: int
stormlog.build_device_memory_collector(device=None)[source]

Build a backend collector for CUDA/ROCm/MPS runtime environments.

Parameters:

device (str | int | torch.device | None)

Return type:

DeviceMemoryCollector

stormlog.detect_torch_runtime_backend()[source]

Return the active torch runtime backend in this environment.

Return type:

str

class stormlog.CPUMemoryProfiler[source]

Bases: object

Lightweight CPU memory profiler mirroring the GPU API.

start_monitoring(interval=0.1)[source]
Parameters:

interval (float)

Return type:

None

stop_monitoring()[source]
Return type:

None

profile_function(func, *args, **kwargs)[source]
Parameters:
  • func (Callable[[...], Any])

  • args (Any)

  • kwargs (Any)

Return type:

CPUProfileResult

profile_context(name='context')[source]
Parameters:

name (str)

Return type:

Any

clear_results()[source]
Return type:

None

get_summary()[source]
Return type:

Dict[str, Any]

class stormlog.CPUMemoryTracker(sampling_interval=0.5, max_events=10000, enable_alerts=True, job_id=None, rank=None, local_rank=None, world_size=None, telemetry_sink_config=None)[source]

Bases: object

CPU tracker offering a superset of the GPU tracker interface.

Parameters:
  • sampling_interval (float)

  • max_events (int)

  • enable_alerts (bool)

  • job_id (Optional[str])

  • rank (Optional[int])

  • local_rank (Optional[int])

  • world_size (Optional[int])

  • telemetry_sink_config (Optional[TelemetrySinkConfig])

get_session_summary()[source]
Return type:

SessionSummary | None

start_tracking()[source]
Return type:

None

stop_tracking()[source]
Return type:

None

enter_phase(name, *, metadata=None)[source]

Enter one structured CPU tracking phase.

Parameters:
  • name (str)

  • metadata (Dict[str, Any] | None)

Return type:

PhaseHandle

phase(name, *, metadata=None)[source]

Context manager that emits structured CPU phase telemetry.

Parameters:
  • name (str)

  • metadata (Dict[str, Any] | None)

Return type:

Any

get_events(event_type=None, last_n=None, since=None)[source]

Get tracking events with optional filtering.

Parameters:
  • event_type (str | None) – Filter by event type

  • last_n (int | None) – Get last N events

  • since (float | None) – Get events since timestamp

Returns:

List of filtered events

Return type:

List[TrackingEvent]

get_statistics()[source]
Return type:

Dict[str, Any]

get_memory_timeline(interval=1.0)[source]
Parameters:

interval (float)

Return type:

Dict[str, List[float]]

clear_events()[source]
Return type:

None

export_events(filename, format='csv')[source]
Parameters:
  • filename (str)

  • format (str)

Return type:

None

export_events_with_timestamp(directory, format)[source]
Parameters:
  • directory (str)

  • format (str)

Return type:

str

stormlog.telemetry_event_from_record(record, permissive_legacy=True, default_collector='legacy.unknown', default_sampling_interval_ms=0, default_session_id=None)[source]

Create a canonical telemetry event from v3, v2, or legacy records.

Parameters:
  • record (Mapping[str, Any])

  • permissive_legacy (bool)

  • default_collector (str)

  • default_sampling_interval_ms (int)

  • default_session_id (str | None)

Return type:

TelemetryEventV3

stormlog.telemetry_event_to_dict(event)[source]

Serialize a telemetry event to a plain dictionary.

Parameters:

event (TelemetryEventV3 | TelemetryEventV2)

Return type:

dict[str, Any]

stormlog.validate_telemetry_record(record)[source]

Validate a v2 or v3 telemetry record.

Raises:

ValueError – if the record is invalid or partial.

Parameters:

record (Mapping[str, Any])

Return type:

None

stormlog.load_telemetry_events(path, permissive_legacy=True, events_key=None, session_id=None)[source]

Load telemetry events from JSON and return the selected session.

Parameters:
  • path (str | Path)

  • permissive_legacy (bool)

  • events_key (str | None)

  • session_id (str | None)

Return type:

list[TelemetryEventV3]

stormlog.resolve_distributed_identity(*, job_id=None, rank=None, local_rank=None, world_size=None, metadata=None, env=None)[source]

Normalize distributed identity fields from explicit, metadata, or env inputs.

Parameters:
  • job_id (Any)

  • rank (Any)

  • local_rank (Any)

  • world_size (Any)

  • metadata (Mapping[str, Any] | None)

  • env (Mapping[str, str] | None)

Return type:

dict[str, Any]

class stormlog.TimelineMarker(session_id, start_ns, end_ns, kind, source, severity, label, rank=None, local_rank=None, world_size=None, event_type=None, metadata=<factory>)[source]

Bases: object

Normalized timeline landmark derived from telemetry or annotation sources.

Parameters:
  • session_id (str)

  • start_ns (int)

  • end_ns (int | None)

  • kind (str)

  • source (str)

  • severity (str)

  • label (str)

  • rank (int | None)

  • local_rank (int | None)

  • world_size (int | None)

  • event_type (str | None)

  • metadata (dict[str, Any])

session_id: str
start_ns: int
end_ns: int | None
kind: str
source: str
severity: str
label: str
rank: int | None = None
local_rank: int | None = None
world_size: int | None = None
event_type: str | None = None
metadata: dict[str, Any]
property is_interval: bool

Return whether the marker spans a non-point interval.

stormlog.derive_timeline_markers(events, *, include_phase_markers=True)[source]

Derive normalized timeline markers from telemetry events.

Parameters:
  • events (Sequence[Any])

  • include_phase_markers (bool)

Return type:

list[TimelineMarker]

stormlog.derive_session_timeline_markers(session, *, include_phase_markers=True)[source]

Derive normalized markers from one loaded telemetry session.

Parameters:
Return type:

list[TimelineMarker]

stormlog.timeline_marker_to_dict(marker)[source]

Serialize a marker into a JSON-safe mapping.

Parameters:

marker (TimelineMarker)

Return type:

dict[str, Any]

stormlog.get_gpu_info(device=None)[source]

Get comprehensive GPU information.

Parameters:

device (str | int | torch.device | None) – GPU device to query (None for current device)

Returns:

Dictionary with GPU information

Return type:

Dict[str, Any]

stormlog.format_bytes(bytes_value, precision=2)[source]

Format bytes into human-readable format.

Parameters:
  • bytes_value (int) – Number of bytes

  • precision (int) – Decimal precision

Returns:

Formatted string (e.g., “1.25 GB”)

Return type:

str

stormlog.convert_bytes(value, from_unit, to_unit)[source]

Convert between different byte units.

Parameters:
  • value (int | float) – Value to convert

  • from_unit (str) – Source unit (B, KB, MB, GB, TB)

  • to_unit (str) – Target unit (B, KB, MB, GB, TB)

Returns:

Converted value

Return type:

float

Modules

analyzer

Advanced analysis tools for memory profiling data.

attributed_viz

Stormlog-native memory visualisation with tensor attribution.

cli

Command-line interface for Stormlog.

collective_attribution

Heuristics for attributing hidden-memory spikes to collective communication.

collector_health

Shared collector-health state and retry helpers.

context_profiler

Context profiler for easy function and code block profiling.

cpu_profiler

CPU-only memory profiler and tracker.

cuda_native_debug

CUDA-native allocator history capture and attribution helpers.

derived_fields

Registry-driven derived-field layer for Stormlog telemetry.

device_collectors

Backend-aware device memory collector abstractions.

diagnose

Diagnostic bundle builder for the Stormlog diagnose command.

distributed_analysis

Distributed telemetry analysis helpers.

gap_analysis

Shared hidden-memory gap analysis utilities.

issues

Durable issue fingerprints and grouped issue row models.

jax

JAX support for Stormlog.

oom_flight_recorder

OOM flight recorder helpers for bounded event capture and dump artifacts.

phases

Structured phase telemetry helpers for trackers and analysis.

profiler

Core Stormlog for PyTorch.

query

Local query API for Stormlog artifact directories and telemetry files.

query_cli

Command-line interface for local Stormlog artifact queries.

release_version

Helpers for deriving the next release version from Git tags.

session

Shared session identity and lifecycle helpers.

telemetry

Canonical telemetry event schema and legacy conversion helpers.

telemetry_model

Backend-neutral projection over the persisted telemetry event schema.

telemetry_sink

Append-only telemetry sink with rollover and retention bounds.

tensorflow

TensorFlow support for Stormlog.

timeline_markers

Derived timeline marker helpers for telemetry sessions.

tracker

Real-time memory tracking and monitoring.

tui

Textual-based terminal UI and top-level Stormlog dispatcher.

utils

Utility functions for GPU memory profiling.

visualizer

Visualization tools for GPU memory profiling data.

wandb_integration

Optional Weights & Biases export helpers for Stormlog outputs.