stormlog.distributed_analysis
Distributed telemetry analysis helpers.
Functions
|
Analyze distributed telemetry for merged timelines and first-cause spikes. |
|
Merge rank streams into a single aligned timeline. |
|
Return a JSON-serializable cross-rank analysis summary. |
Classes
|
Merged distributed timeline state. |
|
The distributed first-cause detection result. |
|
A ranked first-cause candidate. |
|
A single telemetry sample in a rank-aligned timeline. |
- class stormlog.distributed_analysis.CrossRankMergeResult(job_id, world_size, participating_ranks, missing_ranks, rank_sample_counts, alignment_offsets_ns, merged_points, notes=<factory>)[source]
Bases:
objectMerged distributed timeline state.
- Parameters:
job_id (str | None)
world_size (int)
participating_ranks (list[int])
missing_ranks (list[int])
rank_sample_counts (dict[int, int])
alignment_offsets_ns (dict[int, int])
merged_points (list[RankTimelinePoint])
notes (list[str])
- job_id: str | None
- world_size: int
- participating_ranks: list[int]
- missing_ranks: list[int]
- rank_sample_counts: dict[int, int]
- alignment_offsets_ns: dict[int, int]
- merged_points: list[RankTimelinePoint]
- notes: list[str]
- class stormlog.distributed_analysis.FirstCauseAnalysisResult(cluster_onset_timestamp_ns, suspects, notes=<factory>)[source]
Bases:
objectThe distributed first-cause detection result.
- Parameters:
cluster_onset_timestamp_ns (int | None)
suspects (list[FirstCauseSuspect])
notes (list[str])
- cluster_onset_timestamp_ns: int | None
- suspects: list[FirstCauseSuspect]
- notes: list[str]
- class stormlog.distributed_analysis.FirstCauseSuspect(rank, first_spike_timestamp_ns, aligned_first_spike_timestamp_ns, peak_delta_bytes, spike_window_samples, lead_over_cluster_onset_ns, confidence, evidence, phase_attribution=None)[source]
Bases:
objectA ranked first-cause candidate.
- Parameters:
rank (int)
first_spike_timestamp_ns (int)
aligned_first_spike_timestamp_ns (int)
peak_delta_bytes (int)
spike_window_samples (int)
lead_over_cluster_onset_ns (int)
confidence (str)
evidence (dict[str, int | str])
phase_attribution (PhaseAttribution | None)
- rank: int
- first_spike_timestamp_ns: int
- aligned_first_spike_timestamp_ns: int
- peak_delta_bytes: int
- spike_window_samples: int
- lead_over_cluster_onset_ns: int
- confidence: str
- evidence: dict[str, int | str]
- phase_attribution: PhaseAttribution | None = None
- class stormlog.distributed_analysis.RankTimelinePoint(rank, timestamp_ns, aligned_timestamp_ns, device_used_bytes, allocator_reserved_bytes, allocator_allocated_bytes, allocator_change_bytes)[source]
Bases:
objectA single telemetry sample in a rank-aligned timeline.
- Parameters:
rank (int)
timestamp_ns (int)
aligned_timestamp_ns (int)
device_used_bytes (int)
allocator_reserved_bytes (int)
allocator_allocated_bytes (int)
allocator_change_bytes (int)
- rank: int
- timestamp_ns: int
- aligned_timestamp_ns: int
- device_used_bytes: int
- allocator_reserved_bytes: int
- allocator_allocated_bytes: int
- allocator_change_bytes: int
- stormlog.distributed_analysis.analyze_cross_rank_events(events, *, phase_resolver=None)[source]
Analyze distributed telemetry for merged timelines and first-cause spikes.
- Parameters:
events (Sequence[TelemetryEventV2])
phase_resolver (PhaseReplayIndex | None)
- Return type:
- stormlog.distributed_analysis.merge_cross_rank_timelines(events)[source]
Merge rank streams into a single aligned timeline.
- Parameters:
events (Sequence[TelemetryEventV2])
- Return type:
- stormlog.distributed_analysis.summarize_cross_rank_analysis(events, *, phase_resolver=None)[source]
Return a JSON-serializable cross-rank analysis summary.
- Parameters:
events (Sequence[TelemetryEventV2])
phase_resolver (PhaseReplayIndex | None)
- Return type:
dict[str, Any]