Postgres workflows make durable execution feel boring

Postgres workflows

Postgres workflows are getting a fresh look because DBOS argues that durable execution does not always need a separate orchestration service. The pitch is simple: store workflow state, step outputs, locks, and recovery checkpoints in PostgreSQL, then let application servers coordinate through the database they already operate.

The short version

  • DBOS describes a durable execution model where application servers poll a Postgres workflows table, checkpoint each step, and recover crashed jobs from the last completed step.
  • The technical bet is that row locking, uniqueness constraints, indexes, SQL queries, and normal Postgres operations can replace a chunk of what teams buy from external orchestrators.
  • This is most attractive when the workflow is close to the application domain and the team already trusts Postgres in production.
  • The hard parts do not disappear. Payload size, hot tables, transaction retries, worker crashes, and retry semantics still need explicit design.
  • The broader developer-tool angle is practical: agent runs, video processing, document pipelines, and AI background jobs all need durable execution, but many teams do not want another distributed system first.

What happened

DBOS published a technical argument for Postgres workflows as a simpler durable execution architecture. In the conventional model, systems such as Temporal, Airflow, and AWS Step Functions coordinate workflow execution through a central orchestrator. A worker completes a step, reports the result to the orchestrator, and the orchestrator records the checkpoint before dispatching the next step.

DBOS flips that arrangement. A client creates a workflow record in Postgres. Application servers dequeue work from the table, checkpoint step outputs directly to Postgres, and recover another server’s unfinished work if a process dies. The post points to locking clauses for safe worker competition, integrity constraints for detecting duplicate step writes, SQL for observability, and existing Postgres security and availability practices for operations.

The article also claims that a single Postgres server can handle tens of thousands of workflows per second in the right setup, with distributed or sharded Postgres systems as later options. That number is less useful than the shape of the claim: durable execution is mostly about making progress durable, and a relational database is already built to make state durable.

Why this is worth watching

Postgres workflows are interesting because they move the orchestration question back into the data model. If each step result is a row with clear idempotency rules, the system becomes easier to inspect. A failed payment email, stuck file conversion, or half-finished AI agent run can be queried with SQL before anyone builds a custom dashboard.

That is the best version of this idea. It does not say every team should replace Temporal tomorrow. It says many teams reach for a workflow platform before they have written down the actual state machine, retry boundary, and checkpoint model. Starting with Postgres can force those decisions into tables, indexes, and constraints. That can be refreshingly boring.

There is also a product lesson here for developer-tool builders. The IT & AI archive keeps circling the same theme: teams want more reliability for background work, but they have little patience for heavy platforms unless the pain is already obvious. Postgres workflows fit that mood. They offer a path between ad hoc job queues and a full workflow stack.

What Hacker News readers are arguing about

The Hacker News discussion is useful because it separates the slogan from the operational details. Several engineers liked the general pattern, especially for queues built with SELECT FOR UPDATE SKIP LOCKED or advisory locks. The pro-Postgres camp mostly argued from experience: if Postgres is already in the stack, a workflow table can be cheaper and easier to reason about than another service.

The skepticism was more specific. One thread challenged the article’s mention of CockroachDB as a way to scale Postgres-like systems, with commenters pointing to compatibility gaps, missing operators, index limitations, and repeated serialization_failure retries in real systems. That is a reminder that “Postgres-compatible” is not the same as “Postgres with the same operational behavior.”

Temporal also dominated part of the thread. Some commenters described large self-hosted Temporal deployments as expensive and infrastructure-heavy, while others pushed back that those workloads may be a poor fit or that Temporal Cloud pricing can look reasonable depending on event volume. The useful takeaway is not that Temporal is bad. It is that workflow engines have their own cost curve, and teams should compare that curve against the complexity they would add to Postgres.

A smaller but important thread focused on payload size. People were wary of putting large documents or video artifacts directly in a queue or workflow table. The practical pattern is the old claim-check approach: store the large object elsewhere, then pass a reference through the workflow state. That applies whether the orchestrator is Postgres, Temporal, or a cloud queue.

Where Postgres workflows fit

Postgres workflows fit best when the workflow is part of your application, the steps can be made idempotent, and the team can model retries and checkpoints in SQL without turning the main database into a dumping ground.

The practical read

Use this pattern when the workflow is close to your product and your team already knows how to operate Postgres under load. This is a strong fit for internal job pipelines, AI agent tasks, document processing, notification chains, and service-local background work.

Be more cautious when the workflow spans many teams, languages, approval states, and long-running human processes. A dedicated workflow system may earn its weight there, especially if it gives you mature tooling around versioning, visibility, timeouts, and operator workflows.

The test is not ideological. Sketch one real workflow. Count the steps. Write down what each step stores, how it retries, what happens after a worker crash, and where large payloads live. If that design fits naturally into Postgres tables and constraints, DBOS’s argument deserves a serious look. If the model starts turning into a private orchestration platform, buy or adopt the platform instead.

For app builders, the ASO angle is indirect but real: background reliability is becoming part of product discovery. Users do not search app stores for “durable execution,” but they do notice when uploads, agent runs, and media processing quietly resume instead of failing.

Sources