Durable Issue Fingerprinting
Stormlog issue fingerprinting is a deterministic summarization layer over
existing artifacts. It groups repeated failures across sessions without mutating
TelemetryEvent v3, append-only sink segments, diagnose bundles, or OOM flight
recorder bundles.
The v1 implementation follows two outside patterns:
Sentry SDK Fingerprinting treats fingerprints as explicit grouping dimensions.
OpenTelemetry Trace Concepts keeps evidence linkable through context, events, attributes, and links.
Issue Object
stormlog.issues.StormlogIssue is the grouped issue row returned by
stormlog.query.QueryStore.list_issues().
Canonical fields:
fingerprint_id: deterministic hash of schema version, issue kind, and normalized fingerprint dimensionsfingerprint:kind,schema_version,fingerprint_id, anddimensionskind:oom,collector_degradation,alert, orhidden_memory_anomalystate:open,resolved,ignored, orregressedseverity:info,warning, orcriticaltitlehit_countfirst_seen_ns,last_seen_nsaffected_sessionsrepresentative_evidenceevidencedetails
representative_evidence and each entry in evidence can link back to:
session_idtimestamp_nsranksource_pathsource_kindevent_typebundle_pathlow-cardinality
metadata
Current query output defaults derived issues to open. State overrides are
accepted by fingerprint id in the Python API so a future persisted sidecar can
restore resolved, ignored, or regressed state without changing raw
telemetry.
Fingerprint Rules
Fingerprints contain stable grouping dimensions. They intentionally exclude session ids, timestamps, raw file paths, full exception messages, and metric magnitudes unless the value is converted into a stable category.
OOM fingerprints use:
backendreason
OOM details and evidence keep volatile or inconsistently available fields such as exception module/type, collector, device id, rank, bundle path, event count, session status, context, and exact timestamps. This lets the same OOM group together whether it is discovered from an OOM bundle manifest or from telemetry events.
Collector degradation fingerprints use:
collectorbackendhealth_statussorted
partial_fieldsnormalized
error_stem
Collector details keep retry timestamps, consecutive failure counts, and source event metadata.
Alert fingerprints use:
event_typeseveritycollectorbackendnormalized alert
category
High-fragmentation alerts use the stable category high_fragmentation, so
High fragmentation: 40.0% and High fragmentation: 51.5% group together.
Hidden-memory anomaly fingerprints use:
classification:transient_spike,persistent_drift, orfragmentation_likeseveritystable phase summary when available
collectorbackend
Hidden-memory details keep confidence, z-score, slope, gap bytes, fragmentation ratios, sample counts, and phase-attribution payloads.
Worked Examples
OOM bundle:
{
"kind": "oom",
"dimensions": {
"backend": "cuda",
"reason": "message_pattern:out of memory"
}
}
Collector degradation:
{
"kind": "collector_degradation",
"dimensions": {
"backend": "cuda",
"collector": "stormlog.cuda_tracker",
"error_stem": "runtimeerror",
"health_status": "degraded",
"partial_fields": ["device_free_bytes"]
}
}
High-fragmentation alert:
{
"kind": "alert",
"dimensions": {
"backend": "cuda",
"category": "high_fragmentation",
"collector": "stormlog.cuda_tracker",
"event_type": "warning",
"severity": "warning"
}
}
Hidden-memory drift:
{
"kind": "hidden_memory_anomaly",
"dimensions": {
"backend": "cuda",
"classification": "persistent_drift",
"collector": "stormlog.cuda_tracker",
"phase": "train / forward",
"severity": "critical"
}
}
Where Issues Live
In v1, grouped issues are derived at query time:
stormlog query issues ./live_sink ./oom_dumps --json
Python callers can use:
import stormlog.query
store = stormlog.query.open(["./live_sink", "./oom_dumps"])
issues = store.list_issues()
A future persistence pass should write a derived artifact-level issues.json
sidecar next to the artifact set. That sidecar should contain grouped issue
state and cached summaries only. It should not rewrite telemetry events, sink
segments, diagnose manifests, or OOM bundle manifests.
Follow-On Tasks
Persist and reload
issues.jsonstate overrides byfingerprint_id.Add TUI issue tables and issue-detail panes backed by
list_issues().Add issue-oriented report schema fields for agent automation.
Add regression detection by comparing current issue fingerprints with a previous persisted sidecar.
Add controls for ignoring known noisy fingerprints from CLI/TUI surfaces.