AI harness design is becoming the real software moat

AI harness

Tomasz Tunguz argues that the next software fight is moving away from polished SaaS screens and toward the AI harness, the operating layer that turns an LLM into something closer to a dependable worker. His useful framing is simple: models are powerful, but production agents need context, tools, memory, sandboxes, logs, policy, and cost control before they can handle real work.

The short version: AI harness

  • Tunguz describes seven parts of an AI harness: context and memory, tools and action, orchestration, state, sandboxed compute, observability, and cost-aware workflow design.
  • The argument is less about replacing SaaS overnight and more about where software products now create value: in the runtime around the model.
  • For builders, the hard part is no longer choosing a model alone. It is deciding what the agent can see, what it can do, when it stops, and who can audit it later.
  • The startup opening is domain depth. If everyone can rent similar models, the product edge shifts toward messy workflow knowledge and safe execution.

What happened

Tunguz published “Software After AI,” a short essay on May 27, 2026, about the stack that sits around AI agents. The piece uses the word “harness” deliberately. A raw model can answer questions, but a working product has to constrain that model, feed it the right business context, expose tools safely, resume work after failures, and leave an audit trail.

The seven-part list is practical rather than futuristic. Context and memory cover retrieval, short-term task history, and the company-specific recipes people usually keep in their heads. Tools and action cover registries, argument validation, approvals, dispatch, and failure handling. Orchestration covers the think-act-observe loop. State and persistence cover checkpoints and artifacts. Sandbox and compute cover isolated workspaces and credentials outside the model. Observability and governance cover tracing, evals, guardrails, and human review. Cost and workflow optimization cover the decision of which steps should be deterministic, which model should run each step, and where knowledge should live.

Why this is worth watching

The term AI harness is useful because it names the part of agent software that demos often hide. A demo can succeed once with a clever prompt. A product has to succeed repeatedly when the CRM record is stale, the tool call fails, the user asks for a risky change, or the model forgets what it was doing three steps ago.

That is where the SaaS comparison gets interesting. Traditional SaaS products gave users a fixed interface over a database and a workflow. Agent products may hide more of the interface, but they cannot hide responsibility. If an agent refunds a customer, rewrites a contract, changes a cloud setting, or files a report, the company still needs permissions, logs, rollback paths, and a way to explain what happened.

This is also a decent filter for AI product pitches. If a vendor talks only about the model, the demo, or a benchmark, the product may still be thin. The durable work is in the boring layer: retrieval quality, tool boundaries, state recovery, sandbox rules, evals, and unit economics. Readers who track AI infrastructure and developer tooling can find more coverage in the IT & AI archive.

What the discussion is missing

I could not find a dedicated Hacker News thread for this exact article. That absence is a little unfortunate, because the strongest debate would probably be among people building agents in production rather than people judging them from a launch video.

The missing questions are the useful ones. How much of this AI harness should be a platform, and how much has to be custom per industry? Will MCP-style tool registries make agents safer, or will they mostly make unsafe access easier to wire up? Can evals catch the failures that matter in legal, medical, finance, or customer operations? And at what point does the harness become so complex that a deterministic workflow would have been cheaper and safer?

Those are not objections to Tunguz’s framing. They are the next layer of the conversation. The essay says the harness is the new software battleground. The harder question is which parts of that battleground can be standardized.

The practical read

If you are building an agentic product, start with the AI harness before you polish the chat surface. Write down the tools the agent can call, the data it can read, the approvals it needs, the state it must preserve, and the failure cases it must recover from. Then decide which model belongs in each step.

If you are buying AI software, ask a different set of questions. Do not stop at “Which model powers this?” Ask what context system it uses, how tool calls are logged, how sensitive actions are approved, how tasks resume after a crash, how evals run, and how costs are controlled as usage grows.

And if you are a startup, the point is not to out-model the labs. You probably will not. The better bet is to know a workflow so well that your AI harness handles the annoying exceptions, handoffs, and audit needs that a general-purpose agent will miss.

Sources