Kover A/B Report & Gate Verdict

kover specs/kover/ab-report.kmd

Contrato de SAÍDA do Koder Kover: o JSON estável que `kover ab --json` e `kover gate --json` emitem para consumidores downstream (perf-gate de CI, bundle do Kortex, dashboards). O `protocol.kmd` define como um programa é OBSERVADO (entrada); este define o que o Kover PRODUZ a partir de uma comparação A/B — o `Report` (mediana + IQR por métrica, com flag de significância) e o `Verdict` (resultado do gate por orçamento). Schema versionado e forward-compatível; toda regra `R*` é testável.

When this spec applies

Primary triggers

Consumir a saída JSON do Kover (Report/Verdict)

All triggers

Consumir a saída de `kover ab --json` ou `kover gate --json`
Implementar um perf-gate de CI sobre o Report do Kover
Anexar o A/B do Kover ao bundle do Kortex

Spec — Kover A/B Report & Gate Verdict (output contract v0.1)

This spec defines the machine-readable output of a Kover A/B run: the Report (kover ab --json) and the gate Verdict (kover gate --json). Where protocol.kmd is the input contract (how a program is observed), this is the output contract (what Kover emits). It is the surface a CI perf-gate, the Kortex bundle, and dashboards consume. Every rule R* is testable; tests T* at the end.

Scope

Applies to any consumer of a Kover comparison: CI pipelines that gate on performance, the Kortex handoff (RFC-001 §8), and dashboards. The contract is mode-agnostic — a single run is repeats=1 (every Stat has iqr=0), so one schema serves a one-off comparison and an N-run benchmark.

R1 — The metric set is closed and ordered

R1.1 — A Report reports this closed, ordered metric set. Render metrics first, then resources:

`metric`	Meaning	Unit
`fcp_ms`	first-contentful-paint	ms
`dom_interactive_ms`	navigation `domInteractive`	ms
`load_ms`	navigation `loadEventEnd`	ms
`lcp_ms`	largest-contentful-paint (Core Web Vital)	ms
`cls`	cumulative layout shift (Core Web Vital)	score (unitless)
`rss_mb`	resident set, whole process tree	MiB
`cpu_pct`	CPU%, whole process tree	percent

R1.2 — The same metric descriptor set drives both the single-run delta and the repeated median (one source of truth) — a producer MUST NOT emit a metric in one mode that it omits in the other.

R2 — `Stat`: median + inter-quartile spread

R2.1 — Each target's value for a metric is a Stat:

{ "median": 308.0, "p25": 302.0, "p75": 314.0, "min": 298.0, "max": 320.0, "n": 3 }

R2.2 — Quartiles use linear interpolation between closest ranks (the numpy/"type-7" default). iqr = p75 − p25 is the metric's spread. With n=1 every quantile equals the single value and iqr=0.

R2.3 — A single Kover run is noise; the benchmark signal is the median plus the IQR over repeats runs (perf-baseline.md: report median + IQR, never a lone run). A producer of a multi-run Report MUST populate Stat from all repeats, not the last.

R3 — `MetricStats` and the significance rule

R3.1 — One metric across both targets:

{ "metric": "load_ms", "a": <Stat>, "b": <Stat>,
  "delta_median": 6.8, "significant": true }

delta_median = a.median − b.median (positive ⇒ target A is heavier/slower).

R3.2 — significant is true iff |delta_median| exceeds BOTH targets' IQR. This is the "real difference vs run-to-run jitter?" rule. It is a serialized field, not a hint: a consumer reads it directly and MUST NOT recompute a different significance from the quartiles. With n=1 (both iqr=0) any non-zero delta is significant.

R4 — `Report`: the full A/B result

R4.1 — kover ab --json emits:

{ "url": "scenario:flow.json", "primary": "kruze", "secondary": "chrome",
  "repeats": 3, "metrics": [ <MetricStats>, … ] }

R4.2 — primary is target A, secondary is target B — fixed, so delta_median signs are stable across consumers. url is the page (or scenario:<file> when a scenario drove the run).

R5 — `Verdict`: the gate result over a budget

R5.1 — A budget is per-metric ceilings on the A−B median delta:

{ "metrics": { "load_ms": { "max_delta": 50 }, "rss_mb": { "max_delta": 100 } } }

A metric absent from the budget is reported but never gates the build.

R5.2 — kover gate --json emits a Verdict:

{ "pass": false, "results": [
  { "metric": "load_ms", "delta": 75.0, "budget": 50.0,
    "gated": true, "significant": true, "regressed": true }, … ] }

R5.3 — A metric regressed is true iff gated AND delta > max_delta AND significant. Over-budget but not significant is run-to-run jitter and MUST NOT regress (anti-flaky-gate). Verdict.pass is false iff any metric regressed.

R5.4 — A failed scenario assertion (a kover gate --scenario run whose replay failed; scenario-dsl.kmd R1.1 assert) fails the gate distinctly from a budget regression — the producer signals it as a scenario failure, not a regressed metric. A consumer MUST treat a non-zero gate exit without a regressed metric as a correctness failure, not a perf regression.

R6 — Versioning & forward-compat

R6.1 — Unknown fields MUST be ignored by consumers (forward-compat), never fatal. New metrics are additive to the R1.1 set; a minor schema bump never removes or renames a field.

Test cases

#	Check	Severity
T1	A `Report` lists exactly the R1.1 metrics, in order, render-before-resources.	hard
T2	`Stat` quartiles match type-7 interpolation; `iqr = p75 − p25`; `n=1 ⇒ iqr=0`.	hard
T3	`significant` is true iff `	delta_median
T4	`delta_median = a.median − b.median` with `primary`=A, `secondary`=B.	hard
T5	A `regressed` metric satisfies `gated ∧ delta>max_delta ∧ significant`; over-budget-not-significant does not regress (R5.3).	hard
T6	`Verdict.pass=false` iff some metric regressed; a scenario-assertion failure is signalled distinctly (R5.4).	hard
T7	A consumer ignores an unknown field/metric without erroring (R6.1).	soft

Non-goals

The connector / input contract — owned by protocol.kmd.
Scenario step semantics — owned by scenario-dsl.kmd.
Capture byte storage — owned by capture.kmd.
The measurement method (how FCP/RSS/CPU are sampled) — implementation detail of products/dev/kover, not part of the wire contract.

Notes

The reference producer is the kover CLI (products/dev/kover): kover ab --json (Report), kover gate --json (Verdict). The Go types ab.Report, ab.MetricStats, ab.Stat, and gate.Verdict are the canonical encoders.

References

meta/docs/stack/specs/kover/protocol.kmd
meta/docs/stack/specs/kover/scenario-dsl.kmd
meta/docs/stack/rfcs/kover-RFC-001-foundations.kmd
meta/docs/stack/registries/perf-baseline.md
products/dev/kover