stormlog.distributed_analysis

Distributed telemetry analysis helpers.

Functions

analyze_cross_rank_events(events, *[, ...])

Analyze distributed telemetry for merged timelines and first-cause spikes.

merge_cross_rank_timelines(events)

Merge rank streams into a single aligned timeline.

summarize_cross_rank_analysis(events, *[, ...])

Return a JSON-serializable cross-rank analysis summary.

Classes

CrossRankMergeResult(job_id, world_size, ...)

Merged distributed timeline state.

FirstCauseAnalysisResult(...[, notes])

The distributed first-cause detection result.

FirstCauseSuspect(rank, ...[, phase_attribution])

A ranked first-cause candidate.

RankTimelinePoint(rank, timestamp_ns, ...)

A single telemetry sample in a rank-aligned timeline.

class stormlog.distributed_analysis.CrossRankMergeResult(job_id, world_size, participating_ranks, missing_ranks, rank_sample_counts, alignment_offsets_ns, merged_points, notes=<factory>)[source]

Bases: object

Merged distributed timeline state.

Parameters:
  • job_id (str | None)

  • world_size (int)

  • participating_ranks (list[int])

  • missing_ranks (list[int])

  • rank_sample_counts (dict[int, int])

  • alignment_offsets_ns (dict[int, int])

  • merged_points (list[RankTimelinePoint])

  • notes (list[str])

job_id: str | None
world_size: int
participating_ranks: list[int]
missing_ranks: list[int]
rank_sample_counts: dict[int, int]
alignment_offsets_ns: dict[int, int]
merged_points: list[RankTimelinePoint]
notes: list[str]
class stormlog.distributed_analysis.FirstCauseAnalysisResult(cluster_onset_timestamp_ns, suspects, notes=<factory>)[source]

Bases: object

The distributed first-cause detection result.

Parameters:
  • cluster_onset_timestamp_ns (int | None)

  • suspects (list[FirstCauseSuspect])

  • notes (list[str])

cluster_onset_timestamp_ns: int | None
suspects: list[FirstCauseSuspect]
notes: list[str]
class stormlog.distributed_analysis.FirstCauseSuspect(rank, first_spike_timestamp_ns, aligned_first_spike_timestamp_ns, peak_delta_bytes, spike_window_samples, lead_over_cluster_onset_ns, confidence, evidence, phase_attribution=None)[source]

Bases: object

A ranked first-cause candidate.

Parameters:
  • rank (int)

  • first_spike_timestamp_ns (int)

  • aligned_first_spike_timestamp_ns (int)

  • peak_delta_bytes (int)

  • spike_window_samples (int)

  • lead_over_cluster_onset_ns (int)

  • confidence (str)

  • evidence (dict[str, int | str])

  • phase_attribution (PhaseAttribution | None)

rank: int
first_spike_timestamp_ns: int
aligned_first_spike_timestamp_ns: int
peak_delta_bytes: int
spike_window_samples: int
lead_over_cluster_onset_ns: int
confidence: str
evidence: dict[str, int | str]
phase_attribution: PhaseAttribution | None = None
class stormlog.distributed_analysis.RankTimelinePoint(rank, timestamp_ns, aligned_timestamp_ns, device_used_bytes, allocator_reserved_bytes, allocator_allocated_bytes, allocator_change_bytes)[source]

Bases: object

A single telemetry sample in a rank-aligned timeline.

Parameters:
  • rank (int)

  • timestamp_ns (int)

  • aligned_timestamp_ns (int)

  • device_used_bytes (int)

  • allocator_reserved_bytes (int)

  • allocator_allocated_bytes (int)

  • allocator_change_bytes (int)

rank: int
timestamp_ns: int
aligned_timestamp_ns: int
device_used_bytes: int
allocator_reserved_bytes: int
allocator_allocated_bytes: int
allocator_change_bytes: int
stormlog.distributed_analysis.analyze_cross_rank_events(events, *, phase_resolver=None)[source]

Analyze distributed telemetry for merged timelines and first-cause spikes.

Parameters:
Return type:

tuple[CrossRankMergeResult, FirstCauseAnalysisResult]

stormlog.distributed_analysis.merge_cross_rank_timelines(events)[source]

Merge rank streams into a single aligned timeline.

Parameters:

events (Sequence[TelemetryEventV2])

Return type:

CrossRankMergeResult

stormlog.distributed_analysis.summarize_cross_rank_analysis(events, *, phase_resolver=None)[source]

Return a JSON-serializable cross-rank analysis summary.

Parameters:
Return type:

dict[str, Any]