Pular para o conteúdo

Always-on test recipes T1-T9

testing specs/testing/always-on-recipes.kmd

Receitas concretas pros 9 templates de teste obrigatórios em `policies/always-on.kmd § Templates de teste mandatórios`. Cada receita tem setup, comandos de execução, asserts e calibração conhecida. Componentes copy-paste-tweak; não reinventam.

Corpo da especificação

Spec: Always-on test recipes T1–T9

Status: draft v0.1 (2026-05-24). Receitas validadas em produção em pelo menos um componente serão promovidas a stable. Componentes consultam este doc antes de escrever T-suites pra evitar reinvenção.

Convenções comuns

  • Test host: testes pesados rodam em VM em s.khost1 per policies/test-host-isolation.kmd; recipes deste doc não fazem exceção. Comandos abaixo assumem PWD = repo root.
  • SDK headless: policies/headless-first.kmd R8 manda reuso de engines/sdk/koder_test_*. Onde fizer sentido, recipes apontam pro SDK em vez de shell-out.
  • Compat window: recipes assumem janela R1.1 default (2 minor + 1 major + 180 dias). Componentes que TIGHTEN ajustam matrizes proporcionalmente.
  • Linguagem: snippets em Go quando o componente é Go; em Dart pra Flutter; em Bash pra orquestração. Adaptar para a stack do alvo.

T1 — Matriz N × N-1 (R1.1, R1.2, R1.3)

Goal: cada combinação de versões cliente↔servidor dentro da janela R1.1 passa o smoke-test do componente, sem 4xx/5xx percebidos pelo cliente.

Setup

# tests/compat/docker-compose.yml
services:
  server-N-2:
    image: ghcr.io/koder/<component>:${VERSION_N_MINUS_2}
    ports: ["18080:8080"]
  server-N-1:
    image: ghcr.io/koder/<component>:${VERSION_N_MINUS_1}
    ports: ["18081:8080"]
  server-N:
    image: ghcr.io/koder/<component>:${VERSION_N}
    ports: ["18082:8080"]

Run

# tests/compat/run-matrix.sh
set -euo pipefail
versions=(N-2 N-1 N)
for c in "${versions[@]}"; do
  for s in "${versions[@]}"; do
    echo "== client=$c server=$s =="
    KODER_SERVER_URL=http://localhost:1808${s/N-/} \
    KODER_CLIENT_VERSION="$c" \
      go test ./tests/compat/... -run TestSmoke -tags compat
  done
done

Assert

  • All 3×3 = 9 combinations exit 0.
  • No 5xx in server logs.
  • No "schema mismatch" or "version too old" errors in client logs.

Notes

  • N-2 included: only if window_minor_versions ≥ 2 (Stack default). Components that TIGHTEN to N-3 add a 4th row/column.
  • Image source: GHCR is the example; substitute Hub registry (hub.koder.dev/apps/<slug>:<version>) for Koder-hosted artifacts.
  • Per-bug regression: when a compat bug is fixed, add a test under tests/compat/regression/ per policies/regression-tests.kmd.

T2 — Unknown-field round-trip (R2.1, R2.3)

Goal: parser preserva campo novo desconhecido em re-emit; enum desconhecido degrada graceful sem panic.

Setup

// internal/wire/unknown_field_test.go
//go:build compat

package wire

import (
    "bytes"
    "encoding/json"
    "testing"
)

// Sentinel payload with a future field the current parser doesn't model.
const futurePayload = `{
  "v": 3,
  "kind": "MessageDelivered",
  "payload": { "id": 42 },
  "future_field": { "nested": "must survive round-trip" }
}`

Run

go test ./internal/wire/... -run TestUnknownFieldRoundTrip -tags compat -v

Assert

func TestUnknownFieldRoundTrip(t *testing.T) {
    var doc map[string]any
    if err := json.Unmarshal([]byte(futurePayload), &doc); err != nil {
        t.Fatal(err)
    }
    out, _ := json.Marshal(doc)
    if !bytes.Contains(out, []byte(`"future_field"`)) {
        t.Fatal("future_field stripped on round-trip — R2.1 violation")
    }
}

func TestUnknownEnumDegrades(t *testing.T) {
    // Enum value 99 is not yet known. Must NOT panic; must map to UNKNOWN sentinel.
    k := ParseMessageKind(99)
    if k != MESSAGE_KIND_UNSPECIFIED {
        t.Fatal("unknown enum should degrade to UNSPECIFIED — R2.3 violation")
    }
}

Notes

  • For protobuf: rely on proto3 default of preserving unknown fields (since protoc-gen-go 1.21). Verify with proto.MessageReflect(m).GetUnknown().
  • For KMD/KVG/KPKG: parser MUST tolerate unknown directives. See specs/document-format.kmd for the formal contract.

T3 — Rolling upgrade simulated (R4.1, R4.2, R4.3)

Goal: 3 replicas, rolling restart com load gerado em paralelo, zero 5xx percebidos pelo cliente durante a janela.

Setup

# tests/rollout/docker-compose.yml
services:
  lb:
    image: koder-jet:stable
    ports: ["8080:80"]
    depends_on: [app-1, app-2, app-3]
  app-1: &app
    image: ghcr.io/koder/<component>:${VERSION_OLD}
    healthcheck:
      test: ["CMD", "curl", "-fsS", "http://localhost:8080/healthz"]
      interval: 2s
      retries: 3
  app-2:
    <<: *app
  app-3:
    <<: *app

Run

# tests/rollout/rolling-upgrade.sh
docker compose -f tests/rollout/docker-compose.yml up -d
sleep 5  # warm up

# Start load generator (sustained 50 RPS for 90s)
hey -z 90s -q 50 -c 5 http://localhost:8080/healthz > /tmp/load.log &
LOAD_PID=$!

# Roll each replica to NEW version, one at a time
for i in 1 2 3; do
  docker compose -f tests/rollout/docker-compose.yml \
    stop "app-$i"
  docker compose -f tests/rollout/docker-compose.yml \
    --env-file=tests/rollout/new-version.env \
    up -d "app-$i"
  # Wait for healthz before next replica
  until curl -fsS "http://localhost:8080/healthz" >/dev/null; do
    sleep 1
  done
done

wait $LOAD_PID

Assert

# Parse hey output: pass if 0 errors (non-2xx) during entire 90s
errors=$(grep -oP 'responses\]\s+\K[0-9]+\s+\[non-2xx\]' /tmp/load.log | awk '{print $1}')
test "$errors" = "0" || { echo "FAIL: $errors non-2xx during rollout"; exit 1; }

Notes

  • Load tool: hey is the example; substitute wrk2, vegeta, or k6 if your team standardises on those.
  • R4.2: if any replica answers 200 on /healthz before its DB pool is warm, the LB sends traffic to a half-ready instance → 5xx for the user. Tests will detect this naturally.
  • R4.3 graceful shutdown: send SIGTERM (not SIGKILL) during the swap; docker compose stop does this. Verify in-flight requests drain by sampling latency: p99 < 5×p50 during the rollout window.

T4 — Schema migration in production-likeness (R3.1, R3.2, R3.3, R3.4)

Goal: aplicar migration em snapshot prod-like; medir locks, downtime e falha de queries concorrentes; reject se qualquer query bloqueia > 100 ms.

Setup

# tests/migrations/prepare-snapshot.sh
# Restore the most recent prod-like snapshot to a scratch DB.
pg_restore -d migration_test \
  --no-owner --clean --if-exists \
  /var/snapshots/${COMPONENT}-prod-like-latest.dump

Run

# tests/migrations/run-with-load.sh
# 1. Start sustained read+write load against the scratch DB
pgbench -c 10 -j 2 -T 120 -P 1 migration_test > /tmp/pgbench.log &
PG_PID=$!

# 2. Apply the migration after 10s
sleep 10
psql migration_test -f migrations/${VERSION}-up.sql 2>&1 \
  | tee /tmp/migration.log

# 3. Wait load to settle
wait $PG_PID

Assert

# pgbench reports latency_avg per second. Look for spikes.
peak=$(awk '/^progress/ {print $NF}' /tmp/pgbench.log | sort -n | tail -1)
echo "Peak per-second latency_avg: ${peak}ms"
awk -v lim=100 -v peak="$peak" 'BEGIN { exit (peak+0 > lim) ? 1 : 0 }' || {
  echo "FAIL: migration caused ${peak}ms peak latency (limit 100ms) — R3.2 / R3.4 violation"
  exit 1
}

Notes

  • Snapshot freshness: ≤ 24h old; older snapshots may miss recent schema/data shapes that trigger lock paths.
  • Online DDL only: CREATE INDEX CONCURRENTLY, ALTER ... ADD COLUMN (no rewrite), pg_repack for reorders. Any plain CREATE INDEX or bare ALTER NOT NULL on a populated column fails T4 by construction.
  • For kdb: substitute the same pattern with kdb's online DDL paths. See infra/data/kdb/docs/ for the canonical kdb migration helpers.
  • Per migration: T4 runs once per migration file; baseline kept in registries/perf-baseline.md per component.

T5 — Chaos: optional dependency dies (R5.1, R5.2, R5.3, R5.4)

Goal: matar dependência opcional → produto responde 200 em features não-dependentes; feature dependente degrada com mensagem clara (R5.1); circuit breaker abre (R5.2); timeouts e jitter aplicados (R5.3, R5.4).

Setup

# tests/chaos/docker-compose.yml
services:
  app:
    image: ghcr.io/koder/<component>:stable
    environment:
      AI_GATEWAY_URL: http://toxiproxy:8474/proxies/ai
  toxiproxy:
    image: ghcr.io/shopify/toxiproxy:2.5.0
    ports: ["8474:8474"]
  ai-gateway:
    image: ghcr.io/koder/ai-gateway:stable

Run

# Bring up; configure toxiproxy to route ai → ai-gateway:9000
docker compose -f tests/chaos/docker-compose.yml up -d
curl -X POST http://localhost:8474/proxies -d '{
  "name": "ai",
  "listen": "0.0.0.0:9999",
  "upstream": "ai-gateway:9000",
  "enabled": true
}'

# Baseline: feature works
curl -fsS http://localhost:8080/generate-with-ai

# Inject failure: 100% packets dropped
curl -X POST http://localhost:8474/proxies/ai/toxics -d '{
  "type": "timeout",
  "attributes": {"timeout": 0}
}'

# Re-test the same and an unrelated endpoint
curl -fsS http://localhost:8080/healthz                      # MUST still 200
status=$(curl -s -o /dev/null -w '%{http_code}' http://localhost:8080/generate-with-ai)
echo "Feature with failed dep returned: $status"

Assert

  • /healthz returns 200 throughout.
  • /generate-with-ai returns 5xx (or 503 + JSON {"error": "ai_unavailable"}) — NEVER hangs past the timeout configured per R5.3.
  • After dependency restored, /generate-with-ai recovers within one circuit-breaker cooldown window (60s default per R5.2).
  • Logs show retries with jitter ≥ 25% between attempts (R5.4).

Notes

  • toxiproxy toxics: timeout, latency, slow_close, bandwidth, slicer. Use timeout: 0 for total outage; latency: 5000 for slow-loris.
  • For non-network deps (disk full, /dev/random blocked), use chaos-mesh or LXC-level fault injection.

T6 — Resumability of upload (R6.1, R6.3, R6.4)

Goal: upload de 100 MiB interrompido a 50% retoma e produz bytes finais idênticos.

Setup

# tests/resume/fixture.sh
dd if=/dev/urandom of=/tmp/payload.bin bs=1M count=100
sha256sum /tmp/payload.bin > /tmp/payload.sha256

Run

# Start upload in background; kill at 50%
upload_pid=""
( curl -fsS --upload-file /tmp/payload.bin \
       --header "Idempotency-Key: $(uuidgen)" \
       "${UPLOAD_URL}" \
       > /tmp/upload-1.log 2>&1 ) &
upload_pid=$!

sleep 1
# Watch byte progress; kill when bytes_sent >= 50MB
while :; do
  sent=$(ss -tip "( dport = :8080 )" 2>/dev/null \
         | grep -oP 'bytes_sent:\K[0-9]+' | head -1 || echo 0)
  if [ "${sent:-0}" -ge $((50*1024*1024)) ]; then
    kill -9 "$upload_pid"
    break
  fi
  sleep 0.5
done

# Resume from byte cursor
session_id=$(jq -r .session /tmp/upload-1.log)
curl -fsS -H "X-Resume-Session: $session_id" \
     --upload-file /tmp/payload.bin \
     --header "Idempotency-Key: $(uuidgen)" \
     "${UPLOAD_URL}/resume"

Assert

# Server-side: fetch the assembled blob; sha256 must match original
curl -fsS "${UPLOAD_URL}/blob/$session_id" -o /tmp/retrieved.bin
diff <(sha256sum /tmp/retrieved.bin | awk '{print $1}') \
     <(awk '{print $1}' /tmp/payload.sha256) \
  || { echo "FAIL: bytes differ after resume — R6.1 violation"; exit 1; }

Notes

  • Chunk size: 8 MiB default per R6.1; smaller chunks = more bookkeeping, larger = less precise resume cursor.
  • Idempotency key (R6.3): SAME Idempotency-Key on resume must return the same upload metadata, never duplicate.
  • Long-lived session (R6.4): server keeps resume session alive ≥ 5 min by default. Sessions older than the limit return 410 Gone with {"error": "session_expired", "retryable": false}.

T7 — Offline tolerance (mobile/desktop) (R6.2)

Goal: disable network → fluxo principal funciona em features locais → re-enable → sync completa sem perda nem duplicação.

Setup

// integration_test/offline_test.dart (Flutter)
import 'package:integration_test/integration_test.dart';
import 'package:koder_test_input/koder_test_input.dart';
import 'package:koder_test_state/koder_test_state.dart';

void main() {
  IntegrationTestWidgetsFlutterBinding.ensureInitialized();
  testWidgets('offline create + sync round-trip', (tester) async {
    // 1. Bring app up online; baseline sync clean.
    await KoderTestState.attachToProcess();
    expect(await KoderTestState.outboxCount(), 0);

    // 2. Block network (test SDK helper).
    await KoderTestInput.setNetworkEnabled(false);

    // 3. Create 3 items offline.
    for (var i = 0; i < 3; i++) {
      await KoderTestInput.tap('fab-create');
      await KoderTestInput.enterText('input-title', 'item-$i');
      await KoderTestInput.tap('save');
    }

    // 4. Confirm UI shows them with pending sync state.
    expect(await KoderTestState.outboxCount(), 3);
    for (var i = 0; i < 3; i++) {
      expect(await KoderTestState.itemSyncState('item-$i'), 'pending');
    }

    // 5. Re-enable network. Wait for outbox drain.
    await KoderTestInput.setNetworkEnabled(true);
    await KoderTestState.waitFor(
      () async => (await KoderTestState.outboxCount()) == 0,
      timeout: Duration(seconds: 30),
    );

    // 6. Server sees exactly 3 — no duplicates (Idempotency-Key working).
    final remote = await KoderTestState.serverItems();
    expect(remote.length, 3);
    expect(remote.map((e) => e['title']).toSet(), {'item-0','item-1','item-2'});
  });
}

Run

cd app && flutter test integration_test/offline_test.dart \
  -d linux  # or s.khost1 emulator per test-host-isolation.kmd

Assert

Test passes (Dart-level expectations). Server-side row count matches client-side count (no duplicates from retry-after-reconnect).

Notes

  • Conflict resolution: if the user edits the same item while offline and online sessions run concurrently, last-writer-wins is rarely correct. Specs in specs/data-sync/conflict-resolution.kmd (future) will govern; per-component overrides in koder.toml [sync].
  • Test SDK path: setNetworkEnabled is exposed by engines/sdk/koder_test_input per headless-first.kmd R8. Don't shell out to adb or nmcli directly.

T8 — Cross-surface compat coverage (R8.1, R8.4)

Goal: combinação cross-surface (mobile × desktop × web × TV × CLI) × pares de versões dentro da janela R1.1, cobertura mínima 80%.

Setup

# registries/variant-compat-matrix.md (per-component)
# Each row: client-surface × client-version × server-surface × server-version
# Cell value: { status: pass|fail|untested, evidence: <ci-run-url> }

Run

Automated by CI; manual rows added when an integration test passes:

# .gitea/workflows/cross-surface-compat.yml
jobs:
  matrix:
    strategy:
      matrix:
        client_surface: [mobile, desktop, web, tv, cli]
        client_version: [N-1, N]
        server_version: [N-1, N]
    steps:
      - run: ./tests/cross-surface/run.sh \
             ${{ matrix.client_surface }} \
             ${{ matrix.client_version }} \
             ${{ matrix.server_version }}

Assert

  • Per-component CI publishes coverage.json to registries/variant-compat-matrix.md via koder-spec-audit always-on --report --json.
  • Release gate (CI step): fail if covered < 80% of matrix cells.

Notes

  • Surface multiplication is huge; pragmatic minimum is the largest active surface × oldest surface in window per release. Full N×N matrix only for crit components (auth, sync, identity).
  • Coverage matrix file format will be formalised in specs/testing/coverage-matrix.md v1 (currently v0).

T9 — Failover regional (R7.1, R7.2, R7.4)

Goal: simular queda da região primária; medir tempo até traffic re-roteado; assertir ≤ 60s e zero data loss.

Setup

# Two regions: primary (us-east) and replica (eu-west)
# DNS health-check has 30s TTL (R7.2)
# Replica DB lags primary by ≤ 5s async replication

Run

# tests/failover/regional.sh
# 1. Verify primary serving
primary_ip=$(dig +short app.koder.dev)
echo "Primary IP: $primary_ip"
curl -fsS "https://app.koder.dev/healthz"

# 2. Inject failure: stop primary region (use deployment-specific cmd)
incus stop --project=us-east app-primary

# 3. Mark t0; poll DNS + healthz until we're routed elsewhere
t0=$(date +%s)
until [ "$(dig +short app.koder.dev | head -1)" != "$primary_ip" ]; do
  sleep 2
  age=$(( $(date +%s) - t0 ))
  if [ $age -gt 120 ]; then
    echo "FAIL: failover > 120s — R7.2 violation"; exit 1
  fi
done
t1=$(date +%s)
echo "Failover completed in $((t1-t0))s"

# 4. Verify replica region serving the latest data
last_id_before=$(curl -fsS "https://app.koder.dev/last-record-id-before-failover")
last_id_after=$(curl -fsS "https://app.koder.dev/items/$last_id_before")
test "$last_id_after" != "404" || {
  echo "FAIL: data lost on failover — R7.4 violation"; exit 1
}

Assert

  • Failover window ≤ 60s (default; R7.2).
  • Zero data loss for committed writes ≥ replication-lag seconds before the outage.
  • /healthz from the new primary returns 200 within ≤ 5s after DNS flip.

Notes

  • Replication lag is the data-loss budget. Async replication at ≤ 5s lag means writes done < 5s before outage may not have replicated. Per R7.4, backup snapshots (≥ hourly) bound this further.
  • DNS TTL: 30s default per R7.2; tighter TTL = faster failover but more DNS query load. Calibrate per component.
  • Anycast components (Koder Jet, Koder ID) skip DNS-flip and use BGP withdrawal; the same outcome (≤ 60s window, zero data loss) applies but the mechanism differs. Anycast-specific recipe pending.

Coverage gate

Per policies/always-on.kmd § Gate de release, a component cannot release if T1–T9 are missing or failing and the gate is not deferred via always-on-debt.md. The auditor koder-spec-audit always-on --strict reads this state and exits non-zero on missing coverage.

Per-component implementation lives in <component>/tests/ paths:

TestDefault location
T1<component>/tests/compat/
T2<component>/internal/wire/*_test.go (or equivalent)
T3<component>/tests/rollout/
T4<component>/tests/migrations/
T5<component>/tests/chaos/
T6<component>/tests/resume/
T7<component>/integration_test/ (Flutter) or <component>/tests/offline/
T8CI matrix + registries/variant-compat-matrix.md rows
T9<component>/tests/failover/ (single-region: opt out via debt entry)

Status

  • v0.1 (2026-05-24): receitas iniciais. T1–T5 e T9 têm exemplos reusáveis. T7 depende de SDKs Dart (koder_test_input, koder_test_state) — verificar versão antes de copiar.
  • Promoção pra v1.0: depois que ≥ 3 componentes shipparem ao menos T1+T3+T5 e validarem as receitas em produção.
  • Próximas slices: receita pra IPC entre apps Koder (combina com specs/ipc/protocol.kmd); receita anycast/BGP pro T9 alternativo.

Referências