Kover A/B Report & Gate Verdict
kover specs/kover/ab-report.kmd
Contrato de SAÍDA do Koder Kover: o JSON estável que `kover ab --json` e `kover gate --json` emitem para consumidores downstream (perf-gate de CI, bundle do Kortex, dashboards). O `protocol.kmd` define como um programa é OBSERVADO (entrada); este define o que o Kover PRODUZ a partir de uma comparação A/B — o `Report` (mediana + IQR por métrica, com flag de significância) e o `Verdict` (resultado do gate por orçamento). Schema versionado e forward-compatível; toda regra `R*` é testável.
When this spec applies
Primary triggers
- Consumir a saída JSON do Kover (Report/Verdict)
All triggers
- Consumir a saída de `kover ab --json` ou `kover gate --json`
- Implementar um perf-gate de CI sobre o Report do Kover
- Anexar o A/B do Kover ao bundle do Kortex
Specification body
Spec — Kover A/B Report & Gate Verdict (output contract v0.1)
This spec defines the machine-readable output of a Kover A/B run: the
Report(kover ab --json) and the gateVerdict(kover gate --json). Whereprotocol.kmdis the input contract (how a program is observed), this is the output contract (what Kover emits). It is the surface a CI perf-gate, the Kortex bundle, and dashboards consume. Every ruleR*is testable; testsT*at the end.
Scope
Applies to any consumer of a Kover comparison: CI pipelines that gate on
performance, the Kortex handoff (RFC-001 §8), and dashboards. The contract is
mode-agnostic — a single run is repeats=1 (every Stat has iqr=0), so
one schema serves a one-off comparison and an N-run benchmark.
R1 — The metric set is closed and ordered
R1.1 — A Report reports this closed, ordered metric set. Render metrics
first, then resources:
metric | Meaning | Unit |
|---|---|---|
fcp_ms | first-contentful-paint | ms |
dom_interactive_ms | navigation domInteractive | ms |
load_ms | navigation loadEventEnd | ms |
lcp_ms | largest-contentful-paint (Core Web Vital) | ms |
cls | cumulative layout shift (Core Web Vital) | score (unitless) |
rss_mb | resident set, whole process tree | MiB |
cpu_pct | CPU%, whole process tree | percent |
R1.2 — The same metric descriptor set drives both the single-run delta and the repeated median (one source of truth) — a producer MUST NOT emit a metric in one mode that it omits in the other.
R2 — Stat: median + inter-quartile spread
R2.1 — Each target's value for a metric is a Stat:
{ "median": 308.0, "p25": 302.0, "p75": 314.0, "min": 298.0, "max": 320.0, "n": 3 }
R2.2 — Quartiles use linear interpolation between closest ranks (the
numpy/"type-7" default). iqr = p75 − p25 is the metric's spread. With n=1
every quantile equals the single value and iqr=0.
R2.3 — A single Kover run is noise; the benchmark signal is the median plus
the IQR over repeats runs (perf-baseline.md: report median + IQR, never a
lone run). A producer of a multi-run Report MUST populate Stat from all
repeats, not the last.
R3 — MetricStats and the significance rule
R3.1 — One metric across both targets:
{ "metric": "load_ms", "a": <Stat>, "b": <Stat>,
"delta_median": 6.8, "significant": true }
delta_median = a.median − b.median (positive ⇒ target A is heavier/slower).
R3.2 — significant is true iff |delta_median| exceeds BOTH targets' IQR.
This is the "real difference vs run-to-run jitter?" rule. It is a serialized
field, not a hint: a consumer reads it directly and MUST NOT recompute a
different significance from the quartiles. With n=1 (both iqr=0) any non-zero
delta is significant.
R4 — Report: the full A/B result
R4.1 — kover ab --json emits:
{ "url": "scenario:flow.json", "primary": "kruze", "secondary": "chrome",
"repeats": 3, "metrics": [ <MetricStats>, … ] }
R4.2 — primary is target A, secondary is target B — fixed, so delta_median
signs are stable across consumers. url is the page (or scenario:<file> when a
scenario drove the run).
R5 — Verdict: the gate result over a budget
R5.1 — A budget is per-metric ceilings on the A−B median delta:
{ "metrics": { "load_ms": { "max_delta": 50 }, "rss_mb": { "max_delta": 100 } } }
A metric absent from the budget is reported but never gates the build.
R5.2 — kover gate --json emits a Verdict:
{ "pass": false, "results": [
{ "metric": "load_ms", "delta": 75.0, "budget": 50.0,
"gated": true, "significant": true, "regressed": true }, … ] }
R5.3 — A metric regressed is true iff gated AND delta > max_delta AND significant. Over-budget but not significant is run-to-run jitter and MUST
NOT regress (anti-flaky-gate). Verdict.pass is false iff any metric
regressed.
R5.4 — A failed scenario assertion (a kover gate --scenario run whose
replay failed; scenario-dsl.kmd R1.1 assert) fails the gate distinctly
from a budget regression — the producer signals it as a scenario failure, not a
regressed metric. A consumer MUST treat a non-zero gate exit without a
regressed metric as a correctness failure, not a perf regression.
R6 — Versioning & forward-compat
R6.1 — Unknown fields MUST be ignored by consumers (forward-compat), never fatal. New metrics are additive to the R1.1 set; a minor schema bump never removes or renames a field.
Test cases
| # | Check | Severity |
|---|---|---|
| T1 | A Report lists exactly the R1.1 metrics, in order, render-before-resources. | hard |
| T2 | Stat quartiles match type-7 interpolation; iqr = p75 − p25; n=1 ⇒ iqr=0. | hard |
| T3 | significant is true iff ` | delta_median |
| T4 | delta_median = a.median − b.median with primary=A, secondary=B. | hard |
| T5 | A regressed metric satisfies gated ∧ delta>max_delta ∧ significant; over-budget-not-significant does not regress (R5.3). | hard |
| T6 | Verdict.pass=false iff some metric regressed; a scenario-assertion failure is signalled distinctly (R5.4). | hard |
| T7 | A consumer ignores an unknown field/metric without erroring (R6.1). | soft |
Non-goals
- The connector / input contract — owned by
protocol.kmd. - Scenario step semantics — owned by
scenario-dsl.kmd. - Capture byte storage — owned by
capture.kmd. - The measurement method (how FCP/RSS/CPU are sampled) — implementation
detail of
products/dev/kover, not part of the wire contract.
Notes
The reference producer is the kover CLI (products/dev/kover): kover ab --json (Report), kover gate --json (Verdict). The Go types ab.Report,
ab.MetricStats, ab.Stat, and gate.Verdict are the canonical encoders.
References
meta/docs/stack/specs/kover/protocol.kmdmeta/docs/stack/specs/kover/scenario-dsl.kmdmeta/docs/stack/rfcs/kover-RFC-001-foundations.kmdmeta/docs/stack/registries/perf-baseline.mdproducts/dev/kover