Skip to content

pg-dump-v17

postgres-compat specs/postgres-compat/pg-dump-v17.kmd

Specification body

pg-dump v17 — Wire Compatibility Spec

Normative spec for what pg_dump --version 17 expects from a Postgres-wire server. Used as the contract that infra/data/kdb (kdb-pgwire / kdb-gateway) must satisfy to support pg_dump round-trips, and as the source-of-truth for the per-table coverage work tracked in infra/data/kdb#341.

1. Provenance

The catalogue of queries below was captured live from PostgreSQL 17.9 (Debian 17.9-0+deb13u1) using infra/data/kdb/scripts/pgdump-v17-capture.sh.

The script spins a throw-away PG 17 cluster, applies a varied schema (5 tables, 4 indexes including partial, 2 explicit sequences, FK + CHECK constraints, view, materialized view, enum type), and runs pg_dump under log_statement = all so the server records the exact stream of statements pg_dump issued.

Captures are checked in under meta/docs/stack/specs/postgres-compat/captures/pg-dump-v17-queries-<flag>.sql for the following flag combinations:

File suffixFlags
default(no flags)
schema-only--schema-only
data-only--data-only
no-owner-no-acl--no-owner --no-acl
table-users--table=users

Re-capture by running the script — output is byte-stable across runs of the same PG 17 minor (pg_dump does not emit non-deterministic queries).

2. Session prelude (every invocation)

The first ~13 statements in every pg_dump run are session-shape GUCs. Every server claiming pg_dump-compat MUST accept these without error (returning SET / SELECT is sufficient; the values can be ignored if the server has no equivalent knob):

SELECT pg_catalog.set_config('search_path', '', false);
SELECT pg_catalog.pg_is_in_recovery();
SELECT pg_catalog.set_config('search_path', '', false);
SET DATESTYLE = ISO;
SET INTERVALSTYLE = POSTGRES;
SET extra_float_digits TO 3;
SET synchronize_seqscans TO off;
SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET transaction_timeout = 0;
SET row_security = off;
SELECT set_config(name, 'view, foreign-table', false)
  FROM pg_settings
  WHERE name = 'restrict_nonsystem_relation_kind';
BEGIN;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ, READ ONLY;

The final two open a read-only repeatable-read transaction that wraps the entire dump; servers without MVCC snapshots can accept these as no-ops provided they expose a stable read for the duration of the dump.

3. Catalogue queries — required surface

pg_dump v17 issues queries against the following pg_catalog relations. Servers MUST respond with the row shape PostgreSQL 17 produces; missing columns cause pg_dump to bail with a parser error in its PQgetvalue(... , <col>) calls. Functions in bold are out-of-band helpers (not relations) — they MUST exist and return the documented type.

3.1 Relations (must be queryable, even if empty)

RelationMin. columns pg_dump v17 readsNotes
pg_rolesoid, rolnameEmpty result OK if --no-owner
pg_extensiontableoid, oid, extname, extrelocatable, extversion, extconfig, extconditionEmpty result OK; pg_dump iterates the list
pg_dependclassid, objid, refobjid (+ refclassid filter, deptype filter)Drives extension-membership detection
pg_namespacetableoid, oid, nspname, nspowner, nspacl (+ acldefault('n', nspowner))At minimum public and pg_catalog
pg_classtableoid, oid, relname, relnamespace, relkind, reltype, relowner, relchecks, relhasindex, relhasrules, relpages, relhastriggers, relpersistence, reloftype, relacl, relfrozenxid, relreplident, relrowsecurity, relforcerowsecurity, relminmxid, reloptions, relispopulated, relispartitionDrives the entire object stream — see §4 below
pg_proctableoid, oid, proname, prolang, pronargs, proargtypes, prorettype, proacl, pronamespace, proowner, prokindFilter prokind <> 'a' (non-aggregates)
pg_typetableoid, oid, typname, typnamespace, typacl, typowner, typelem, typrelid, typtype, typisdefinedUsed to resolve column types
pg_languagetableoid, oid, lanname, lanpltrusted, lanplcallfoid, laninline, lanvalidator, lanacl, lanownerFilter WHERE lanispl
pg_attributeattname, atttypmod, attstorage, attcompression, attmissingval, attstattarget, attndims, attlen, attbyval, attalign, attnotnull, atthasdef, attidentity, attgenerated, attisdropped, attislocal, attinhcountR-CRITICAL — column shape drives CREATE TABLE reconstruction
pg_constraintoid, conname, contype, conrelid, conindid, connamespace, confupdtype, confdeltype, confmatchtype, condeferrable, condeferred, convalidated, conkey, confkey, confrelidCHECK / FK / UNIQUE reconstruction
pg_indexindrelid, indkey, indcollation, indclass, indoption, indpred, indexprs, indisclustered, indisreplident, indisvalid, indisready, indisliveDrives CREATE INDEX reconstruction
pg_inheritsinhrelid, inhparent, inhseqnoPartition / inheritance tree
pg_descriptionobjoid, classoid, objsubid, descriptionCOMMENT ON … reconstruction
pg_publicationoid, pubname, pubowner, puballtables, pubinsert, pubupdate, pubdelete, pubtruncate, pubviarootEmpty result OK
pg_settingsname, settingset_config(...) FROM pg_settings WHERE name = '…'
pg_foreign_tableftrelid, ftserverEmpty result OK if no FDWs

3.2 Relations newly required by PG 17 (gaps vs PG 15/16)

These either did not exist or had different shape in earlier PG and are likely to be missing in a server claiming PG 13/14 compatibility:

RelationWhy pg_dump v17 needs it
pg_castResolves implicit-cast paths during type reconstruction
pg_collationPer-column / per-index collation names
pg_conversionServer-encoding conversions
pg_default_aclDefault privileges from ALTER DEFAULT PRIVILEGES …
pg_event_triggerEvent triggers (DDL hooks) — empty result usually OK
pg_init_privsCaptures the "original ACL" assigned by an extension
pg_opclassIndex operator class lookup
pg_operatorOperator definitions (mostly transitive via opfamily)
pg_opfamilyIndex operator family
pg_publication_namespacePublication-by-schema (PG 15+)
pg_rangeRange type bounds
pg_rewriteView / materialized-view definitions (joined with pg_get_viewdef)
pg_seclabelsSecurity labels (SECURITY LABEL …)
pg_sequenceSequence parameters (PG 10+ split-out of pg_class.relkind = 'S')
pg_statistic_extExtended statistics objects
pg_tablespaceNon-default tablespaces — usually empty in cloud setups
pg_transformTransform mappings (e.g. plpython hstore)
pg_ts_config / pg_ts_dict / pg_ts_parser / pg_ts_templateFull-text-search assets

3.3 Functions

FunctionReturnWhen pg_dump v17 calls it
pg_is_in_recovery()boolSession prelude — almost certainly returns false
pg_get_viewdef(oid)textView / materialized-view body
pg_get_indexdef(oid)textCREATE INDEX … reconstruction
pg_get_constraintdef(oid)textCHECK / FK / UNIQUE body
pg_get_triggerdef(oid)textTrigger reconstruction
pg_get_expr(node, relid)textDefault-value / partial-index predicate body
pg_get_serial_sequence(table, col)textReidrata SERIAL / IDENTITY columns
pg_options_to_table(text[])setofFDW server OPTIONS decoding
acldefault(char, oid)aclitem[]Default-ACL synthesis (called inline in §3.1 queries)
array_remove(any[], any)any[]Used inline to drop check_option=… from reloptions

3.4 PG 17 row-shape additions

The big v15→v17 movement in pg_attribute matters:

  • attstorage is "char" in v17 (was oid historically; some legacy shims still return oid — pg_dump v17 will reject those rows).
  • attcompression exists from v14+ ("char", default 'p' for pglz).
  • attmissingval is anyarray (often emitted as text).
  • attstattarget is int2 (became nullable in v17 — NULL allowed).
  • attgenerated is "char" ('s' for STORED, '' for none).
  • attidentity is "char" ('a' for ALWAYS, 'd' for BY DEFAULT, '' for none).

pg_class.relrowsecurity and relforcerowsecurity are bool (RLS-related). pg_class.relispartition is bool (PG 10+); MUST exist even on non-partitioned tables (default false).

4. Object-fetch loop (the heart of pg_dump)

After the session prelude, pg_dump runs one large pg_class query to materialize the entire table-of-objects:

SELECT c.tableoid, c.oid, c.relname, c.relnamespace, c.relkind, c.reltype,
       c.relowner, c.relchecks, c.relhasindex, c.relhasrules, c.relpages,
       c.relhastriggers, c.relpersistence, c.reloftype, c.relacl,
       acldefault(CASE WHEN c.relkind = 'S' THEN 's'::"char" ELSE 'r'::"char" END,
                  c.relowner) AS acldefault,
       CASE WHEN c.relkind = 'f'
            THEN (SELECT ftserver FROM pg_catalog.pg_foreign_table WHERE ftrelid = c.oid)
            ELSE 0 END AS foreignserver,
       c.relfrozenxid, tc.relfrozenxid AS tfrozenxid, tc.oid AS toid,
       tc.relpages AS toastpages, tc.reloptions AS toast_reloptions,
       d.refobjid AS owning_tab, d.refobjsubid AS owning_col,
       tsp.spcname AS reltablespace, false AS relhasoids,
       c.relispopulated, c.relreplident, c.relrowsecurity,
       c.relforcerowsecurity, c.relminmxid, tc.relminmxid AS tminmxid,
       array_remove(array_remove(c.reloptions, 'check_option=local'),
                                  'check_option=cascaded') AS reloptions,
       CASE WHEN 'check_option=local' = ANY (c.reloptions) THEN 'LOCAL'::text
            WHEN 'check_option=cascaded' = ANY (c.reloptions) THEN 'CASCADED'::text
            ELSE NULL END AS checkoption,
       am.amname, (d.deptype = 'i') IS TRUE AS is_identity_sequence,
       c.relispartition AS ispartition
FROM pg_class c …

(Truncated — see captures/pg-dump-v17-queries-schema-only.sql for the full text.) The relkind values returned drive the rest of the dump:

relkindFollowed by
'r'pg_attributepg_constraintpg_indexLOCK TABLE … ACCESS SHARECOPY … TO STDOUT (data-only)
'v'pg_get_viewdef(oid)
'm'pg_get_viewdef(oid) (materialized) + REFRESH MATERIALIZED VIEW directive
'S'pg_sequence row + setval('…', last_value, is_called)
'i'Index — usually emitted by following pg_index
'p'Partitioned table — followed by pg_inherits to enumerate children
'c'Composite type — pg_dump may skip if the type is system-managed
'f'Foreign table — pulls the server name from the foreignserver correlated subquery

The LOCK TABLE … IN ACCESS SHARE MODE is non-negotiable: servers MUST either implement table locks or accept the syntax as a no-op (returning LOCK TABLE). Without it, pg_dump v17 aborts the dump.

5. Data path (--data-only)

For each non-system table whose relkind = 'r', pg_dump issues:

COPY public.users (id, handle, email, score, created_at,
                   tags, metadata, avatar, status) TO STDOUT;

The column list is materialized from the pg_attribute query above. The server MUST emit the response on the wire as CopyOutResponse → CopyData* → CopyDone → CommandComplete("COPY n"). Text format is the default; binary is not used by pg_dump v17.

Implementation in kdb-pgwire: ticket infra/data/kdb#340.

6. Out-of-scope for v17 compat

  • Foreign-key reconstruction across schemas with --schema=… — pg_dump v17 chases FKs to outside-schema parents; the audit script pins to public so the captures do not exercise this corner.
  • Server-side COMMENT ON LARGE OBJECT — large-object dump uses a separate code path (--blobs / --no-blobs) not exercised by the audit. Tracked separately if/when lobs become a goal.
  • Parallel pg_dump (-j N) — parallel restore uses extra COPY sessions but the per-session query stream is the same.
  • pg_dump --inserts — uses INSERT INTO … VALUES (…) instead of COPY. Audit captures focus on the default COPY path.

7. Versioning

Captures are pinned to PG 17.9. Re-run the capture script when:

  1. PostgreSQL 18 ships and pg_dump v18 diverges (verify with a fresh capture against PG 18 in /usr/lib/postgresql/18/bin/).
  2. A user reports a pg_dump failure on kdb-gateway that the diff report does not cover.

See kdb-gateway-v0.0.13-vs-pg17-pgdump.md for the per-query coverage delta against the current kdb-pgwire surface.

8. Conformance status (2026-05-11)

The 8 originally-identified R-CRITICAL gaps have been closed across a multi-day series of fixes culminating in the FK round-trip milestone. Latest harness state (scripts/pg-dump-roundtrip.sh):

CheckState
pg_dump --schema-only exit code0
pg_dump --data-only exit code0
psql -f kdb-schema.sql (PG 17)rc=0
psql -f kdb-data.sql (PG 17)rc=0
Row-count round-trip100% match
FK guard (orphan INSERT rejected)blocked (enforced) ✓
Schema diff lines35 (was 77 originally)

Closed work-items

  • #341 umbrella — pg_catalog coverage for the 8 R-CRITICAL gaps
  • #342 — round-trip test harness
  • #343 — extended-query set_config emission
  • #344 (16 waves) — universal tableoid + per-table column gaps
  • #345 — planner IS TRUE / IS FALSE
  • #346 — subquery catalog inheritance
  • #347IS DISTINCT FROM
  • #348 — text [N] subscript
  • #349ARRAY (SELECT …) constructor
  • #350Text → Array<T> cast
  • #351pg_catalog. prefix-strip for aggregates
  • #353 — FK round-trip umbrella (PK ALTER, FK ALTER, conindid linkage)
  • #357 — regclass cast → oid
  • #360 — FK capture under tenant config
  • #361 — synthetic pg_index + pg_class for PK/UNIQUE

Open polish items (post-milestone)

  • #352 — schema-diff cosmetic cleanup (35 lines remaining; \restrict random tokens unavoidable; public SCHEMA placeholder block deferred)
  • #359 — public SCHEMA block suppression (requires pg_dump v17 source vendoring to map the suppression algorithm)

Regression coverage

infra/data/kdb/crates/kdb-pgwire/tests/pgwire_fk_roundtrip.rs guards the milestone's three load-bearing wirings (see registry entries 518-520).