Multi-tenancy contract

multi-tenancy specs/multi-tenancy/contract.kmd

Specification body

Spec — Multi-tenancy contract

Mecanismos concretos para implementar policies/multi-tenant-by-default.kmd. Spec é normativo: todo módulo da Koder Stack que armazena dado-de-usuário deve passar nos checks T1–T9 abaixo.

Identity model

koder_user_id   BIGINT NOT NULL    -- FK to services/foundation/id.user(id)
workspace_id    BIGINT             -- nullable; FK to id.workspace(id)

koder_user_id é canonical PK partial em toda tabela com PII. workspace_id amplifica scope: nullable significa "pessoal"; non-null significa "do workspace, todos os members veem por membership".

Tabela canonical de membership (Koder ID):

CREATE TABLE workspace_member (
  workspace_id BIGINT NOT NULL,
  koder_user_id BIGINT NOT NULL,
  role TEXT NOT NULL,             -- 'owner' | 'admin' | 'member' | 'viewer'
  joined_unix BIGINT NOT NULL,
  PRIMARY KEY (workspace_id, koder_user_id)
);

Qualquer query cross-workspace passa por essa tabela. Não cacheia membership client-side — é hot path do auth, fica em services/foundation/id com cache de 60s server-side.

PAT scope grammar

PAT (Personal Access Token) emitido pelo Koder ID carrega scope. Sintaxe canonical (herdado do Flow RFC-003 credentials/backups):

<verb>:<resource>[:<modifier>]

verbs:    read | write | admin
resources: user | workspace | repo | credentials | backups | …
modifier:  optional, e.g. "self" or "<id>"

Exemplos:

  • read:user — ler perfil próprio
  • write:credentials — escrever credentials no scope que o PAT herdou
  • read:workspace:<id> — ler dados específicos de um workspace
  • admin:user — privileged self-management

PAT é scoped to a single koder_user_id (o owner). Workspace access é resolvido via workspace_member na hora do request, não via PAT scope. PATs não atravessam tenants.

RLS template (Postgres / kdb-next)

Toda tabela com PII tem RLS. Helper migration:

-- 1. Schema com tenant fields
CREATE TABLE my_resource (
  id BIGSERIAL,
  koder_user_id BIGINT NOT NULL REFERENCES koder_id.user(id),
  workspace_id BIGINT REFERENCES koder_id.workspace(id),
  payload JSONB NOT NULL,
  created_unix BIGINT NOT NULL DEFAULT extract(epoch from now()),
  PRIMARY KEY (koder_user_id, id)
);

-- 2. Index on tenant + recent
CREATE INDEX ix_my_resource_user_recent
  ON my_resource (koder_user_id, created_unix DESC);

-- 3. RLS enable + policy
ALTER TABLE my_resource ENABLE ROW LEVEL SECURITY;

CREATE POLICY p_owner ON my_resource
  USING (koder_user_id = current_setting('koder.uid')::BIGINT);

CREATE POLICY p_workspace_member ON my_resource
  USING (workspace_id IS NOT NULL
         AND EXISTS (
           SELECT 1 FROM koder_id.workspace_member m
           WHERE m.workspace_id = my_resource.workspace_id
             AND m.koder_user_id = current_setting('koder.uid')::BIGINT
         ));

Connection setup (per request):

conn.Exec(ctx, "SET LOCAL koder.uid = $1", auth.UserID)
// queries thereafter are RLS-filtered automatically

Bypass admin path (rare): RESET koder.uid é privilege da role koder_admin only. Audit log obrigatório em qualquer reset.

KV / cache template (Redis-style)

Toda key tem prefixo de tenant:

<namespace>:<tenant-key>:<resource-key>

examples:
  rate_limit:user:<uid>:5h_window     → counter
  session:user:<uid>:<session_id>     → JSON
  presence:workspace:<wid>:<uid>      → boolean

Helper:

func TenantKey(uid int64, parts ...string) string {
    return fmt.Sprintf("user:%d:%s", uid, strings.Join(parts, ":"))
}

Key sem prefixo → bug crítico (cross-tenant leak via cache).

S3 / object storage template

Path:

<bucket>/<koder_user_id>/<workspace_id|"personal">/<resource_id>/<file>

IAM / signed-URL: per-request, restricted to the tenant prefix.

Test contract — T1..T9

Todo módulo multi-tenant tem suite que cobre:

ID Test Description
T1 Auth required GET /resource sem PAT → 401
T2 Self read A's PAT, GET /my-resource → A's data only
T3 Cross-tenant read denied A's PAT, GET /resource/<B's id>404 (not 403)
T4 Cross-tenant write denied A's PAT, POST /resource setting koder_user_id=B → 400 or silent override to A
T5 Workspace member read A in workspace W, GET /resource?workspace=W → all members' data
T6 Workspace non-member read A not in W, GET /resource?workspace=W → 404
T7 RLS isolation Direct DB query without SET LOCAL koder.uid → returns nothing (or error)
T8 Index efficiency EXPLAIN of A's read uses tenant index, not seq scan
T9 Tenant deletion When user A is deleted, all WHERE koder_user_id = A rows are removed within retention window

Cada implementação ships com tests/multi-tenant/T1..T9_test.go (ou equivalente). Audit: PR sem T1..T9 verde bloqueia merge (ver policies/regression-tests.kmd co-enforcement).

Error model

Cenário HTTP gRPC Body
Sem auth 401 UNAUTHENTICATED {"error": "auth required"}
Token inválido 401 UNAUTHENTICATED {"error": "invalid token"}
Recurso não-existente OU de outro tenant 404 NOT_FOUND {"error": "not found"}
Recurso existe mas role insuficiente (workspace member sem write) 403 PERMISSION_DENIED {"error": "insufficient role"}
Bad input 400 INVALID_ARGUMENT {"error": "<details>"}
Server error 500 INTERNAL {"error": "internal"}

Crítico: 404, não 403, em cross-tenant cases. 403 vaza existência ("este id existe mas você não pode ler" → atacante sabe que existe).

Audit log

Toda operação mutating que toca PII grava audit row:

CREATE TABLE audit_log (
  id BIGSERIAL PRIMARY KEY,
  actor_user_id BIGINT NOT NULL,    -- the PAT owner
  target_user_id BIGINT,            -- tenant being acted on (often = actor)
  action TEXT NOT NULL,             -- 'create' | 'update' | 'delete' | 'read_admin'
  resource TEXT NOT NULL,           -- 'credentials' | 'usage' | …
  resource_id BIGINT,
  payload JSONB,
  created_unix BIGINT NOT NULL
);
CREATE INDEX ix_audit_actor ON audit_log (actor_user_id, created_unix DESC);
CREATE INDEX ix_audit_target ON audit_log (target_user_id, created_unix DESC);

Audit row é best-effort write (failure logs but doesn't abort the action; ver flow#056b policy).

Sharding model (futuro, hyperscale)

Quando uma tabela passar de ~10M rows ou ~100K tenants ativos:

  • Range-shard por koder_user_id (TiKV PD faz isso automático em kdb-next)
  • Hash-shard via hash(koder_user_id) % N (alternativa em Postgres com Citus / pg_partman)
  • Geo-shard por região do tenant (multi-region future, ver stack-RFC-001 §faseamento)

Não pré-otimizar. Trigger: monitoring sinalizar p99 latência > 50ms ou table size > 1TB.

Edge cases

User rename / handle change

koder_user_id é immutable — handle (@username) muda; ID não. Toda referência cross-table usa koder_user_id (BIGINT), nunca handle.

Workspace transfer

Workspace muda de owner: workspace.owner_id muda; workspace_id permanece. Resources com workspace_id = X continuam acessíveis pelos members atuais.

Account deletion (GDPR-style)

Quando user pede delete:

  1. Set user.deleted_unix = NOW() (soft delete)
  2. Cron job de retention varre tabelas e deleta rows WHERE koder_user_id = X AND <table-specific retention>
  3. Audit row em audit_log registra "user_deleted" antes da limpeza
  4. Retention default: 30 dias (configurável per-tenant pra compliance).

Account merge

Out of scope — Koder Stack não suporta merge automático de accounts. Admin-only manual operation se necessário.

Spec audit

Aplicabilidade automática (futuro: koder-spec-audit multi-tenancy):

  • Escaneia migrations: tabelas com PII columns (email, name, password*, key*) sem koder_user_id → flag
  • Escaneia routers: endpoints sem auth middleware → flag
  • Escaneia código: SELECT * FROM <pii-table> sem WHERE → flag
  • Escaneia env vars: shared cache keys sem prefixo → flag

Severity: error (block release) na primeira release que adopt o audit; advisory antes.

References