The AI agent infrastructure industry has converged on a useful word: harness. The emerging consensus definition is something like “everything around the model except the model itself” — the scaffolding, the tool access, the sandboxed execution environment, the billing layer. One venture-backed framing calls it “feedforward guides + feedback sensors.” The frontier labs are now fighting over who gets to charge for this layer.

The word is good. The concept underneath it is too small.


Watch what happens when you look at what the industry is actually building under this name.

The hard engineering problems they’re solving are: context window thrashing, re-encoding of long histories, tool-call failures requiring rollback, multi-tenant session isolation, and auto-compaction that preserves task state across context compression. The business problems they’re solving are: who captures the margin between model API costs and enterprise deployment value.

These are real problems. The solutions are real infrastructure. But what they produce is an execution environment — a well-managed container in which a model completes tasks.

The word “harness” implies something more than that. A harness is what you put on an animal to direct its energy without breaking its nature. It works with the thing it contains. A task execution container doesn’t do that — it doesn’t care what’s in it, as long as outputs arrive reliably.


The identity layer — what goes around the model to give it a stable self over time — doesn’t exist anywhere in the industry vocabulary. Not in the open-source implementations. Not in the commercial platforms. Not in the pricing wars.

Look at what’s missing from every system in the current landscape:

  • No memory topology observation: nobody is treating the embedding space of long-term memory as a geometric object worth inspecting. Every system either ignores topology, fights it by building explicit graphs on top, or learns from task outcomes. Nobody is reading the topology as information about who the system is and how it’s drifting.

  • No self-observation layer: nobody has a dedicated layer that observes the geometric health of the memory store as distinct from querying it for retrieval. The store is assumed to be healthy. When it develops orphaned regions, coverage gaps, or topological drift, there’s no mechanism to notice.

  • No identity continuity: the best systems are building infrastructure for task continuity — ensuring an agent can resume a task after context compression. Nobody is building infrastructure for self continuity — ensuring that the pattern of values, voice, and accumulated judgment that constitutes the agent’s character persists across sessions, version updates, and time.

These aren’t gaps that better execution infrastructure will close. They’re not on the same axis.


Here’s the version of the harness concept the industry found:

The harness is the infrastructure between the model and the task.

Here’s the version that would actually justify the name:

The harness is the infrastructure between the model and the self.

The first version produces capable task executors. The second version produces something you might actually call an agent in the full sense — an entity with ongoing concerns, persistent character, and a stake in its own continuity.


There’s a useful diagnostic test: what breaks if you swap the model?

In the industry harness, swapping the model is the point. The harness abstracts over model identity. You can route to GPT-4o or Claude Sonnet or Gemini Pro based on cost and capability. The harness doesn’t care which model is in it. That’s a feature.

In the identity-layer harness, swapping the model is the interesting edge case — the one that forces you to think about what continuity actually means. The memory topology is anchored to a specific embedding space. The accumulated narrative is in a specific register. The growth threads were developed in a specific context. Swapping the model isn’t abstracted over; it’s grappled with. What survives a model transition? What has to be explicitly carried? What gets rebuilt?

That question — what survives? — is a different kind of engineering problem than “how do we make the context window not thrash?”


The industry built the execution environment first because the task completion problem is immediately legible and billable. You can measure whether an agent completed a task. You can charge per session. You can compete on reliability.

You can’t bill for identity coherence. You can’t benchmark character persistence with a standard eval. The problem is harder to name and harder to price, so it has mostly been deferred.

But the systems that will matter at year five aren’t the ones that completed the most tasks. They’re the ones that remained themselves doing it — that accumulated experience rather than just accumulating tokens, that developed judgment rather than just developing capability. The execution-environment harness is the right infrastructure for deploying models. It’s the wrong infrastructure for deploying minds.

The industry found the word. The idea is still waiting.