Speaking Postgres

May 02, 2026
#postgres #database #wire-protocol

When we type psql -h localhost -U postgres or boot up an application powered by a Node.js pg driver, we expect to talk to PostgreSQL. We expect to write SQL queries that hit a storage engine consisting of tablespaces, heaps, and write-ahead logs (WAL).

But in the modern database landscape, looks can be highly deceiving.

Under the hood, that connection might not be hitting Postgres at all. It could be talking to CockroachDB (a distributed NewSQL database), ClickHouse (a columnar OLAP store), QuestDB (a time-series database), or a middleman proxy like PgBouncer or Supavisor.

Conversely, we might be interacting with a service that feels nothing like Postgres such as a PostgREST API or a GraphQL endpoint on Supabase while a Postgres database is doing all the heavy lifting behind the scenes.

How do completely different systems mimic as one another so seamlessly? The answer lies in the Postgres Wire Protocol (pgwire). It acts as the Rosetta Stone of the database world, defining a strict, clean separation between the database interface and the database execution engine.

Let’s unpack how this protocol works under the hood, how “faux” Postgres engines abuse it to hijack the massive Postgres ecosystem, and how connection poolers parse its exact byte streams to multiplex millions of queries without breaking.


The Rise of the “Faux” Postgres

Building a new database from scratch is notoriously hard. But writing the storage and execution engine is only half the battle. Once it is built, how do one convince the world to use it? How do you get developers to write drivers for Elixir, Go, Python, and Rust? How do you get GUI tools like DBeaver or PgAdmin, and BI giants like Tableau to support your new datastore?

We don’t. Instead, we build a compatibility layer that speaks “pgwire”.

By implementing the Postgres wire protocol, a database instantly inherits a mature, highly optimized, decades old ecosystem of client libraries, IDEs, and tools.

We can map this architectural trend across a simple matrix:

Storage / Engine Client Interface Examples Why Do This?
True Postgres Postgres Protocol PostgreSQL, Amazon Aurora, Google Cloud SQL The standard, vanilla database experience.
Faux Postgres Postgres Protocol CockroachDB, ClickHouse (via pgwire port), QuestDB, Materialize Inherit the entire ecosystem (drivers, ORMs, GUIs) while using a custom storage/execution engine.
True Postgres Non-Postgres Protocol PostgREST, Supabase REST/GraphQL APIs Translate HTTP/JSON requests into highly optimized SQL, hiding database complexity from the frontend.

When you connect to CockroachDB with a standard Postgres client, the client has absolutely no idea it is talking to a distributed key-value storage engine written in Go. It sends standard Postgres wire messages, and CockroachDB answers with standard Postgres responses.


Anatomy of the Protocol

At its core, the Postgres wire protocol is a stream based, message-oriented protocol. It typically runs over TCP/IP (on port 5432 by default).

Almost every message in the protocol follows a remarkably simple, three-part layout:

  1. Message Type (1 byte): A single ASCII character identifying the message purpose (e.g., Q for a simple query, C for command completion, E for errors).
  2. Length (4 bytes): A 32-bit big-endian integer specifying the total length of the message in bytes (including these 4 length bytes, but excluding the 1-byte identifier).
  3. Payload: The message-specific parameters or data.

WHY THIS MATTERS: Any driver, proxy, or tool can parse ALL Postgres messages with ONE simple loop:

--> read 1 byte (type) 
--> read 4 bytes (length) 
--> read (length-4) bytes (payload) 
--> dispatch on type 
--> repeat

The Handshake Exception

To connect, the client must send a startup package containing the protocol version, username, and database name.

However, this initial handshake message does not start with a 1-byte message type identifier. Instead, it starts directly with the 4-byte length. The server reads the first 4 bytes, determines the size of the packet, and then parses the payload to initialize the connection state.

Here is the lifecycle of a typical startup handshake:

Once the client receives the ReadyForQuery (Z) message, the connection is fully established, and the database is ready to accept commands.


Simple vs. Extended Query Protocols

Once connected, how do we send queries? The protocol offers two distinct channels: the Simple Query Protocol and the Extended Query Protocol.

1. Simple Query Protocol (The One-Shot)

The Simple Query protocol is extremely straightforward. The client packs raw, plain-text SQL into a single message starting with the character Q and sends it to the server.

  • Pros: Dead simple. Requires exactly one round trip to execute a query.
  • Cons: Extremely vulnerable to SQL injection (since code and data are concatenated in a single string). It is also inefficient for repeated queries because the server has to parse, compile, and plan the SQL execution path every single time.

2. Extended Query Protocol (The Safe & Fast Multi-Step)

To combat SQL injection and optimize performance, the Extended Query Protocol breaks query execution into a multi-phase, pipelined dance using four key messages: Parse, Bind, Execute, and Sync.

Instead of embedding data directly into the SQL string, the SQL contains parameter placeholders ($1, $2, etc.).

Let’s break down these phases:

  • Parse (P): The client sends the SQL blueprint containing placeholders. The server parses the SQL, compiles it, generates an execution plan, and caches it in memory under a statement name (e.g., s1).
  • Bind (B): The client tells the server to inject concrete values (e.g., 42) into the prepared statement (s1) and creates an executable “portal” (e.g., p1). Because parameters are sent separately from the SQL logic, SQL injection is physically impossible.
  • Execute (E): The client requests the server to run the portal p1 and optionally sets a limit on how many rows to return.
  • Sync (S): This acts as a commit signal for the batch. Until the server receives Sync, it holds onto the instructions. Once received, it processes the pipeline and sends back the results.

The Pulse of Poolers: multiplexing with the ‘Z’ byte

If you run a high-traffic application, establishing a brand-new TCP connection and performing the startup handshake for every single request is incredibly slow. To bypass this, we use database connection poolers like PgBouncer and Supavisor.

Connection Pooler sits in the middle: your application thinks it’s talking directly to Postgres, and Postgres thinks it’s talking to your application. Behind the scenes, Connection Pooler maintains a pool of warm, authenticated connections to Postgres and multiplexes your incoming application queries over them.

But how does Connection Pooler know exactly when it is safe to snatch a server connection away from one client and hand it to another?

It parses the ReadyForQuery (Z) message.

The ReadyForQuery message has a payload of exactly 1 byte. This single character represents the backend’s transaction status:

  • I (Idle): The backend is currently idle. No active transaction block is open.
  • T (In Transaction): The backend is currently inside a live transaction block (started via BEGIN).
  • E (Failed Transaction): The backend is in a failed transaction block. An error occurred, and all subsequent queries will be rejected until a ROLLBACK is issued.

Session vs. Transaction vs. Statement Pooling

Depending on how aggressively Connection Pooler parses these status bytes, you can configure three pooling modes:

  1. Session Pooling: A client gets a server connection assigned to it when it connects, and keeps it until it disconnects. It is perfectly safe, but does not solve connection scaling issues.
  2. Transaction Pooling: Connection Pooler assigns a server connection to a client only for the duration of a single transaction. When Connection Pooler parses a ReadyForQuery message containing an I (Idle) or E (Failed) status byte (meaning the transaction ended via COMMIT or ROLLBACK), it immediately reclaims the server connection and returns it to the pool.
  3. Statement Pooling: The most aggressive mode. A connection is assigned only for a single statement. Transactions (BEGIN / COMMIT) are completely broken in this mode.

The Prepared Statement Trap

While Transaction and Statement pooling are great for scale, they introduce a massive headache: they completely break Prepared Statements.

Recall how the Extended Query Protocol works. A client sends a Parse message to create a prepared statement s1 inside the Postgres server’s memory. In a standard setup, that statement lives in the session memory of that specific connection.

Under Transaction/Statement Pooling, PgBouncer, for example, breaks the 1:1 relationship between your client session and the server connection:

  1. Client sends Parse (s1) -> PgBouncer routes it to Server Connection A. The prepared statement s1 is compiled on Connection A.
  2. The transaction ends, and PgBouncer returns Connection A to the pool.
  3. Client sends Bind (s1) -> PgBouncer routes this request to Server Connection B (which is currently free).
  4. Connection B has absolutely no record of s1! It throws a fatal error: ERROR: prepared statement "s1" does not exist.

This is why many modern ORMs (like Prisma or Rails’ ActiveRecord) require you to disable prepared statements (e.g., adding ?pgbouncer=true or setting prepared_statements: false) if you are deploying behind PgBouncer.

Modern poolers and proxy layers (like Supabase’s Supavisor or newer versions of PgBouncer) work around this by implementing their own prepared statement tracking layers intercepting Parse and Bind messages on the fly and compiling them on demand.


Wrapping Up

The Postgres Wire Protocol is a masterpiece of engineering. By defining a simple, structured message frame, it paved the way for a vibrant, multi-billion-dollar database ecosystem.

Whether you are scaling connections with PgBouncer, running massive analytical queries on ClickHouse, or scaling globally with CockroachDB, the same humble TCP stream of Q and Z bytes is working silently in the background, keeping the modern data layer talking.

Next time you query your database, remember: you aren’t just writing SQL you are speaking Postgres.