stormlog.device_collectors

Backend-aware device memory collector abstractions.

Functions

build_device_memory_collector([device])

Build a backend collector for CUDA/ROCm/MPS runtime environments.

detect_torch_runtime_backend()

Return the active torch runtime backend in this environment.

Classes

CudaDeviceCollector([device])

Collector for NVIDIA CUDA runtime memory counters.

DeviceMemoryCollector()

Backend-specific collector contract for device memory signals.

DeviceMemorySample(allocated_bytes, ...)

Normalized device-memory sample produced by a backend collector.

DeviceMemorySampleResult(sample[, ...])

Device-memory sample plus diagnostics about partial/core collection failures.

MPSDeviceCollector([device])

Collector for Apple Metal (MPS) runtime counters.

ROCmDeviceCollector([device])

Collector for ROCm runtimes surfaced through torch.cuda APIs.

class stormlog.device_collectors.DeviceMemoryCollector[source]

Bases: ABC

Backend-specific collector contract for device memory signals.

abstract name()[source]

Return runtime backend name (cuda, rocm, mps).

Return type:

str

abstract is_available()[source]

Return whether this collector can sample in the current runtime.

Return type:

bool

abstract sample()[source]

Collect a single normalized memory sample.

Return type:

DeviceMemorySample

sample_with_diagnostics()[source]

Collect a sample while preserving core-failure diagnostics.

Return type:

DeviceMemorySampleResult

abstract capabilities()[source]

Describe backend capability signals for telemetry metadata.

Return type:

Dict[str, Any]

class stormlog.device_collectors.DeviceMemorySample(allocated_bytes, reserved_bytes, used_bytes, free_bytes, total_bytes, active_bytes, inactive_bytes, device_id)[source]

Bases: object

Normalized device-memory sample produced by a backend collector.

Parameters:
  • allocated_bytes (int)

  • reserved_bytes (int)

  • used_bytes (int)

  • free_bytes (int | None)

  • total_bytes (int | None)

  • active_bytes (int | None)

  • inactive_bytes (int | None)

  • device_id (int)

allocated_bytes: int
reserved_bytes: int
used_bytes: int
free_bytes: int | None
total_bytes: int | None
active_bytes: int | None
inactive_bytes: int | None
device_id: int
class stormlog.device_collectors.DeviceMemorySampleResult(sample, partial_fields=(), errors=<factory>, core_error=None)[source]

Bases: object

Device-memory sample plus diagnostics about partial/core collection failures.

Parameters:
  • sample (DeviceMemorySample | None)

  • partial_fields (tuple[str, ...])

  • errors (dict[str, str])

  • core_error (str | None)

sample: DeviceMemorySample | None
partial_fields: tuple[str, ...] = ()
errors: dict[str, str]
core_error: str | None = None
property is_partial: bool
property is_core_failure: bool
class stormlog.device_collectors.CudaDeviceCollector(device=None)[source]

Bases: DeviceMemoryCollector

Collector for NVIDIA CUDA runtime memory counters.

Parameters:

device (Union[str, int, torch.device, None])

telemetry_collector = 'stormlog.cuda_tracker'
name()[source]

Return runtime backend name (cuda, rocm, mps).

Return type:

str

is_available()[source]

Return whether this collector can sample in the current runtime.

Return type:

bool

sample()[source]

Collect a single normalized memory sample.

Return type:

DeviceMemorySample

sample_with_diagnostics()[source]

Collect a sample while preserving core-failure diagnostics.

Return type:

DeviceMemorySampleResult

capabilities()[source]

Describe backend capability signals for telemetry metadata.

Return type:

Dict[str, Any]

class stormlog.device_collectors.ROCmDeviceCollector(device=None)[source]

Bases: CudaDeviceCollector

Collector for ROCm runtimes surfaced through torch.cuda APIs.

Parameters:

device (Union[str, int, torch.device, None])

telemetry_collector = 'stormlog.rocm_tracker'
name()[source]

Return runtime backend name (cuda, rocm, mps).

Return type:

str

is_available()[source]

Return whether this collector can sample in the current runtime.

Return type:

bool

capabilities()[source]

Describe backend capability signals for telemetry metadata.

Return type:

Dict[str, Any]

class stormlog.device_collectors.MPSDeviceCollector(device=None)[source]

Bases: DeviceMemoryCollector

Collector for Apple Metal (MPS) runtime counters.

Parameters:

device (Union[str, int, torch.device, None])

telemetry_collector = 'stormlog.mps_tracker'
name()[source]

Return runtime backend name (cuda, rocm, mps).

Return type:

str

is_available()[source]

Return whether this collector can sample in the current runtime.

Return type:

bool

sample()[source]

Collect a single normalized memory sample.

Return type:

DeviceMemorySample

sample_with_diagnostics()[source]

Collect a sample while preserving core-failure diagnostics.

Return type:

DeviceMemorySampleResult

capabilities()[source]

Describe backend capability signals for telemetry metadata.

Return type:

Dict[str, Any]

stormlog.device_collectors.build_device_memory_collector(device=None)[source]

Build a backend collector for CUDA/ROCm/MPS runtime environments.

Parameters:

device (str | int | torch.device | None)

Return type:

DeviceMemoryCollector

stormlog.device_collectors.detect_torch_runtime_backend()[source]

Return the active torch runtime backend in this environment.

Return type:

str