Workflow Executor Protocol

workflow specs/workflow/executor-protocol.kmd

Contrato de como o engine do Koder Workflow (services/ai/workflow) invoca os serviços que executam cada step kind — llm→gateway, tool→tools, code→sandbox, agent→agents — e como a resposta vira o output do step. Define o Executor interface comum (mapeamento step→chamada→output, propagação de tenant/auth, timeout, idempotência, mapeamento de erro) e o binding por kind. Estado de prontidão dos contratos: `code`→sandbox está PRONTO; `llm`/`tool`/ `agent` expõem só endpoints adjacentes — seus contratos de EXECUÇÃO ainda precisam ser definidos (este spec os especifica como required). Desbloqueia os per-kind executors de workflow#008.

When this spec applies

All triggers

Implementar um per-kind Executor no engine do Koder Workflow (workflow#008)
Adicionar/alterar o endpoint de execução de gateway/tools/sandbox/agents consumido pelo workflow
Mapear a saída de um serviço de execução para o output de um step do workflow
Definir propagação de tenant/timeout/idempotência numa chamada workflow→serviço

Spec — Workflow Executor Protocol

Status: Stable v1.0.0 (ratified 2026-05-28). Normative. The §4.1/§4.3/ §4.4 endpoint proposals are now required of the owning services (gateway/tools/agents) and tracked as AIGW-054, TOOLS-021, AGENTS-022. The §4.3 invocation pattern is fixed to (a) tools-mediated invoke (per §4.3 default). R-rules (§3) are normative for any executor consuming this protocol.

1. Scope

The Koder Workflow engine (services/ai/workflow/backend/internal/engine) advances a run by executing each ready step through a single seam — the Executor interface:

type Executor interface {
    Execute(ctx context.Context, run *Run, stepName string, step dsl.Step) (any, error)
}

The engine already ships executors for the in-process kinds (subflow → nested run; human → pause+signal; branch/parallel/aggregate handled in the advance logic). This spec covers the out-of-process kinds — the ones that call another Koder service:

Step kind	Target service	Step config (from `pkg/authoring`)
`llm`	`services/ai/gateway`	`{model, prompt}`
`tool`	`services/ai/tools`	`{tool}`
`code`	`services/ai/sandbox`	`{language, source}`
`agent`	`services/ai/agents`	`{agent}`

It defines (a) the common contract every out-of-process executor obeys (§3) and (b) the per-kind binding to each service's wire API (§4), flagging which contracts are READY vs TO-DEFINE.

This is the unblock artifact for workflow#008 ("engine production wiring") — its per-kind executors are blocked precisely because three of the four execution contracts below do not exist yet.

2. Readiness (ratified 2026-05-28)

Kind	Service endpoint	State
`code`	`POST /v1/sandbox/sessions/{id}/exec` (`ExecRequest`→`ExecResponse`)	✅ SHIPPED — `CodeExecutor` + `HTTPSandboxClient` (workflow#008 `code.go`, 2026-05-26)
`llm`	`POST /v1/chat/completions` (existing, OpenAI-shaped)	🟢 READY — endpoint exists; §4.1 amended (v1.0.1) to point at it; executor ships against this shape
`tool`	`POST /v1/tools/{name}/invoke` (tools-mediated, TOOLS-021)	🟢 SPECIFIED — invoke shape ratified §4.3 (a); executor pending endpoint ship
`agent`	`POST /v1/agents/{id}/run` (proposed, AGENTS-022)	🟢 SPECIFIED — run shape ratified §4.4; executor pending endpoint ship

code ships today. The other three are blocked-by §6 follow-up tickets on their owning services; each ships as the corresponding endpoint lands.

3. Common executor contract (R-rules)

R1 — Request derivation. Each executor builds its request from step.Config (the typed kind config) plus the run scope (run.State, prior steps.X.output in run.History, run.Inputs). Template tokens (${inputs.topic}, ${steps.research.output}) are rendered by the engine before the executor sees them (consistent with the CEL guard scope).
R2 — Tenant propagation. The call carries run's koder_user_id (and workspace_id if set) per policies/multi-tenant-by-default.kmd. A cross-tenant target resolves to 404, never 403. No executor may call a service without an authenticated tenant context.
R3 — Auth. Service-to-service auth uses the Koder ID service-account token of the workflow runner (not the end user's session). The end-user tenant rides in the request body/header per R2, not as the auth principal.
R4 — Timeout. The executor honors step.TimeoutSec (0 = service default). It passes the deadline to the service where the wire supports it (e.g. sandbox timeout_ms) AND enforces it client-side via ctx.
R5 — Idempotency. When step.IdempotencyKey is set (rendered per R1), the executor sends it so a retried step does not double-execute a side-effecting call. The engine already dedupes identical renders within a run; the service must treat the key as a dedupe token where it mutates.
R6 — Output mapping. A successful call returns (any, error) where the any is the JSON-decodable step output the engine writes to History (and to state.<output_to> when the step declares it). §4 fixes the exact shape per kind. Outputs MUST be JSON round-trippable (the store persists them as JSONB).
R7 — Error mapping. A transport/5xx/timeout error is retryable (the engine's per-step retry policy applies). A 4xx/validation error is terminal (fails the step without retry). The executor returns a typed error the engine can classify; it never panics on a service error.
R8 — Large outputs. Outputs over the inline threshold (see workflow#008 kdb-blob item) are stored by reference, not inlined in the run doc. Until kdb-blob ships, executors cap inline output and truncate with a marker (mirroring sandbox's stdout_truncated).
R9 — Observability. Executors run behind the engine Observer seam (StepStarted/StepFinished); they do not emit their own metrics/traces — the engine owns per-step instrumentation (workflow#008 observability item).

4. Per-kind binding

4.1 `llm` → gateway 🟢 RATIFIED (existing `/v1/chat/completions`)

Reconnaissance on 2026-05-28 surfaced that gateway already exposes POST /v1/chat/completions — an OpenAI-compatible chat-completion endpoint with production-grade dispatch (validation, plugin transform, BYOK keys, A/B test variant selection, smart-router model resolution, alias resolution, audit, quality-eval). The Stable v1.0.1 amendment (2026-05-28) re-spec'd §4.1 to point at this existing endpoint rather than mint a new /api/v1/completions (the Draft v0.1.0 proposal), per policies/reuse-first.kmd.

POST /v1/chat/completions
  { "model": "<model>",
    "messages": [{"role": "user", "content": "<rendered prompt>"}],
    "max_tokens"?: int, "temperature"?: float, "stop"?: [string] }
→ 200 {
    "id": "<chatcmpl-…>", "model": "<model>", "created": <unix>,
    "choices": [{
      "index": 0,
      "message": {"role": "assistant", "content": "<text>"},
      "finish_reason": "<stop|length|…>"
    }],
    "usage": {"prompt_tokens": <int>, "completion_tokens": <int>, "total_tokens": <int>}
  }

Headers: Authorization: Bearer <token> (R3), X-Koder-Tenant: <koder_user_id> (R2). The workflow llm step's {model, prompt} config maps to model + a single-message messages array; the step output mapping (R6) is {output: choices[0].message.content, model_used: model, usage: usage}.

Tracked as AIGW-054 (closed 2026-05-28 — no new endpoint required; spec amendment supersedes). The workflow LLMExecutor ships its HTTPLLMClient calling this endpoint directly.

4.2 `code` → sandbox ✅ READY

Two calls: ensure a session, then exec.

POST /v1/sandbox/sessions                       → { "id": "<session>" }   (per-run or per-step)
POST /v1/sandbox/sessions/{id}/exec
  ExecRequest{ code, stdin?, env?, cwd?, timeout_ms?, async? }
→ ExecResponse{ exec_id, exit_code, stdout, stderr, stdout_truncated?,
                stderr_truncated?, duration_ms, peak_memory_kib?, oom?,
                killed_reason? }

Binding: step.Config.source → code; step.Config.language selects the session runtime (chosen at session create). step.TimeoutSec*1000 → timeout_ms (R4). Sync mode (async:false) for a normal step. Step output (R6): { "exit_code", "stdout", "stderr", "duration_ms", "oom", "killed_reason" }. R7: exit_code != 0 is a terminal step failure carrying stderr (not a transport retry); oom/killed_reason likewise terminal. Session lifecycle (reuse per run vs per step, teardown on run completion) is an executor decision documented in workflow#008.

4.3 `tool` → tools 🟢 RATIFIED (tools-mediated invoke)

tools is a registry (GET/PUT/DELETE /v1/tools/{name}, /schema, /lookup) — it stores tool definitions; it does not invoke them today. Invocation runs via MCP (the mcp/ subtree). Ratification on 2026-05-28 fixed the invoke path to tools-mediated (registry + invoke colocated):

POST /v1/tools/{name}/invoke
  { "args": {...}, "tenant": "<koder_user_id>",
    "timeout_ms"?: int, "idempotency_key"?: string }
→ 200 { "output": <per registry schema>, "duration_ms": int }

Internally tools resolves the registry entry, dispatches to the underlying backend (MCP server, gateway-mediated tool call, builtin runner per TOOLS-015), validates the response against GET /v1/tools/{name}/schema, and returns. Tracked in TOOLS-021 (status pending → in-progress on pickup).

Binding: step.Config.tool → tool name; step inputs → args (rendered per R1). Step output (R6): the tool's declared output schema; the executor SHOULD validate the response against it.

4.4 `agent` → agents 🔴 TO-DEFINE (run)

agents exposes memory + MCP-server + strategy management (/v1/agents/{id}/..., /v1/strategies) — not a run endpoint. The agent executor needs:

POST /v1/agents/{id}/run        (proposed)
  { "input": {...}, "tenant": "...", "strategy"?: "<name>" }
→ { "output": ..., "steps": [...], "stop_reason": "<...>" }

Likely async (an agent loop is long-running): the executor SHOULD start the run and either stream or poll, mapping the agent's terminal state to the step output, with the engine's per-step lease (workflow#008) covering the wait. Binding: step.Config.agent → {id}. Step output (R6): { "output", "stop_reason", "steps"? }. Blocked until the run endpoint exists.

5. Testing (laptop-clean slice)

Per policies/headless-first.kmd + the established subflow.go pattern, each executor is unit-testable without the live service: a net/http/httptest mock server returning the §4 response shapes proves the request derivation (R1/R2/R4/R5), output mapping (R6), and error classification (R7) in-process. The mock-test slice ships with the executor; live integration (against the real service on the dev VM / serve-time) is the follow-up. The code executor can do both today (sandbox contract is READY); llm/tool/agent ship the mock-tested client against the §4 proposed shapes, gated on the real endpoints.

6. Follow-up tickets

~~AIGW-054~~ — closed 2026-05-28 via v1.0.1 amendment. §4.1 now points at the existing POST /v1/chat/completions; no new endpoint required.
TOOLS-021 — services/ai/tools adds the tools-mediated invoke endpoint per ratified §4.3 shape.
AGENTS-022 — services/ai/agents adds POST /v1/agents/{id}/run (§4.4).
WORKFLOW-008 — services/ai/workflow implements the code executor now (READY, shipped) + llm executor against /v1/chat/completions shipped + the tool/agent executors mock-tested + live wiring as TOOLS-021 / AGENTS-022 endpoints land.

7. Cross-link

rfcs/workflow-RFC-001-foundations.kmd (engine + DSL foundations)
policies/multi-tenant-by-default.kmd (R2), policies/always-on.kmd (R7 retryability + N/N-1 wire stability), policies/headless-first.kmd (§5)

References

rfcs/workflow-RFC-001-foundations.kmd
services/ai/workflow (engine — internal/engine/engine.go Executor interface)
services/ai/sandbox (POST /v1/sandbox/sessions/{id}/exec — READY)
services/ai/gateway (POST /api/v1/agent/parallel — partial)
services/ai/tools (GET/PUT /v1/tools/{name} — registry only)
services/ai/agents (/v1/agents/{id}/* — config/memory only)
policies/multi-tenant-by-default.kmd
policies/always-on.kmd