It came up offhand on Lenny's Podcast last week: "two agents are better than one." Worth pausing on. The unlock everyone is waiting for isn't a smarter model. It's the moment your agent stops talking to you and starts talking to another agent, and the bottleneck moves to a surface that doesn't exist yet.
The companies that could build that surface won't. Their incentives are anti-aligned with the architecture it requires. Almost everything sold as a multi-agent product today is single-player underneath, and the gap starts to bite the moment two people on the same team try to mount agents into the same workflow. Anyone buying a "multi-agent" product right now is mostly buying a near-future ceiling.
Single-player AI hits a ceiling shaped like a typing speed
Almost every AI workflow today is a chat session. A human types a prompt, the agent returns an answer, the human types the next prompt. Every iteration passes through one channel: human language, typed at the rate a person can articulate intent. The throughput limit isn't the model. It's the operator's fingers.
Smarter models don't move that ceiling. A model that thinks for an extra ninety seconds and produces a better answer still has to wait for the next human input. A model with a million-token context window still needs the operator to assemble the context. Benchmarks rise; typing speed stays where it has always been. The default mode of working with AI is single-player by construction, and the surface area for productivity gains is narrower than the model curves suggest.
That's why the operators living deepest in AI keep describing the workplaces it produces as paradoxical. Every additional unit of automation creates a new unit of supervisory work for the human gardening it. The agent does more, and the agent's keeper does more too. The cause is structural. In a single-player loop, the human is always the rate-limiter, and any agent capability beyond the human's articulation speed gets absorbed back into supervision. More automation, more human work, no net escape.
Two agents transfer context faster than you can type
The podcast framing carries an implicit consequence that goes further than the conversation took it. The speed-up itself is observable: "when I have Codex interact with another agent, it can give so much more context about me and what I want than I would be able to type." The further consequence is that the channel changes when the operator drops out of the loop.
An agent that has been working with you for six months knows the shape of your projects, your standing constraints, your last three decisions, your idiom. A vendor's agent knows the surface of their product, the configuration choices their other customers have made, the gotchas. When those two agents handshake, the exchange carries vastly more context per second than the email a human would have written to set up the same conversation. The bandwidth between agents isn't the model's output speed. It's whatever rate they can synchronize state. That number doesn't look like typing.
That's the real shape of "two agents are better than one." Two agents don't produce twice as much output. The bottleneck stops being human-articulated context and starts being something else.
The bottleneck moves from input speed to shared state
Once agents exchange context directly, the rate limit becomes the surface where both agents can act on the same work. State that persists between sessions. Artifacts both agents can mutate. A history both agents can read. Observable progress so the humans behind them can see what got done. Without that surface, the agents can talk to each other all day and produce nothing the operator can intervene on.
The ecosystem has been reaching for this layer for a year and has built two things that look like it but aren't.
The first is the Agent2Agent protocol, A2A, now stewarded by the Linux Foundation. A2A standardizes how agents from different vendors discover each other, advertise capabilities, and exchange structured messages about tasks. It's a real piece of infrastructure and a useful one. It's also messaging, not state. Two agents using A2A can coordinate; they can't jointly mutate an artifact that survives the call. A2A is the telephone. The shared document is somewhere else.
The second is the Model Context Protocol. MCP solves an adjacent problem: how an agent reaches a tool and carries context across calls to it. The work of the protocol is between one agent and one tool, with the agent as the writer of record. MCP can carry shared context between an agent and a server. It doesn't carry shared context between two agents who both need to act on the same artifact while their humans watch. Conflating MCP with the multi-agent shared-state problem is one of the most common misreadings in the current discourse.
Apps are trying to occupy the gap that protocols leave open. Notion just opened its workspace to Claude Code, Cursor, and Codex as first-class agents through an external agents API. Three competing model providers' agents acting on the same documents inside one product. That's the closest the market has gotten to a shared surface, and it's worth taking seriously.
It's also app-bound. State that lives inside Notion is state for the work that happens inside Notion. The moment the work moves to an editor, a terminal, a CRM, an email thread, a research document, a deal review, the substrate ends and the agents go back to being strangers. A workspace bound to one app is the cleanest possible illustration of why the surface has to live below the app layer, not inside any single one. Otherwise the multi-agent moment becomes a feature of whichever app got there first, and the work boundaries that already constrain teams (you can't move a Notion artifact to Linear without losing half of it) propagate forward.
The model providers structurally can't fill the gap
The obvious candidates to host the shared surface are the same companies that host the runtimes. Anthropic, OpenAI, Google. They have the relationship with the operator, the context surface, and the engineering. They won't build it.
The cause is incentives, not capability. Anthropic's runtime is built for users of Anthropic models. The whole product surface (context window, memory features, agent harness, billing) is shaped around keeping the operator's work inside Claude. Hosting shared state that an OpenAI agent is supposed to mutate would mean letting a competitor's runtime be the writer of record on Anthropic's platform. Every gain Anthropic makes from the operator's work would flow partially to OpenAI, and vice versa. No model lab is going to optimize its core surface for a sharing pattern that hands its competitor real estate inside its own platform.
This isn't a values argument; it's an org-chart argument. The team responsible for Claude's harness reports to the team responsible for Claude's adoption metrics. There's no version of that reporting line in which "make our surface optimal for OpenAI's agents" wins the prioritization meeting. The structural ceiling sits downstream of the structural incentives.
OpenAI's release of a Codex plugin that runs inside Claude Code earlier this year is the exception that proves the rule. It was a real act of cross-runtime cooperation, and it was bounded. A plugin is a call-out and return: Codex acts, hands the result back, Claude Code remains the seat of authority. No shared mutable state, no two agents jointly editing an artifact, no peer relationship. Even at that limited shape, it was a one-direction concession, framed as developer-friendly, and constrained to a single integration pattern. The far end of that road isn't a shared state layer. It's more plugins.
The neutral surface has to be a third party. It was always going to be. The same shape applied to the previous platform shifts. Version control didn't get hosted by Microsoft's own developer tools or Apple's; payments rails didn't get hosted by Visa or by Mastercard's competing acquirers. The shared layer is hosted by whoever has no reason to make any participant the privileged one. That role doesn't fit a model lab.
Four questions to ask any agent product
Use these as a diagnostic. The product brochure will rarely answer them directly; the truthful answer almost always lives one layer below the marketing. If the vendor can't answer with specifics, the answer is no.
1. Can my agent act on the same state as another operator's agent?
Same artifact, same memory, same history. Not "my coworker can see what my agent produced and start their own session." If the product's answer involves exports, sharing screenshots, or "you can both look at it," it isn't shared state. Shared state means both agents are writers of record on the same object at the same time, and the merge of their work is a property of the substrate, not a manual reconciliation.
2. Can my agent on one runtime pick up where my agent on another runtime left off?
The agent on Codex working last night, the agent in Claude Code working this morning, the agent in the browser this afternoon, all the same agent, all the same context, all the same accumulated history. If the answer is "we have a Codex integration and a separate Claude Code integration," the answer is no. Two integrations aren't one mounted agent. The test is whether the agent survives the runtime change without rebuild.
3. Can a coworker mount my agent in their tool without me rebuilding the context?
The agent I have shaped over months of work: can a teammate run it in whatever harness they prefer, against the shared work, without me handing them a folder of files and a setup guide? If onboarding a coworker means rebuilding the agent, the agent is a per-operator artifact, not a team-mountable one. Single-player by construction.
4. When my agent and another agent modify the same artifact, what governs the merge?
This is the deepest question and the one that exposes whether the vendor has actually thought about multi-party state. Concurrent edits, conflicting actions, partial failures: first-order problems in a shared-state world, rounding errors in a single-player one. If the answer is hand-waving or "we don't really see that happen yet," the product is single-player and the multi-party use case is theoretical to them.
If any answer is "no," the product is single-player. The multi-agent unlock sits ahead of the team using it, not behind.
What the diagnostic is and isn't
The four questions aren't a vendor scorecard. They don't tell you whether a product is good. A single-player product can be excellent at single-player work; most of the current generation are. What the questions tell you is where the ceiling is. A team that buys a product failing question 1 today will hit that wall the first time two operators try to collaborate through their agents. A team that buys a product passing 1 and 2 but failing 4 will hit the wall later, in a smaller and messier way, when the multi-party edits start arriving.
The four questions also map back to the structural argument. Question 1 tests whether single-player has been escaped at all. Question 2 tests whether the bottleneck has actually moved away from operator-articulated context. Question 3 tests whether the substrate is held by someone who can be neutral. Question 4 tests whether the operational details of shared state, the part most vendors are still hand-waving past, have been worked through.
The diagnostic doesn't predict who will own the layer. It predicts that whoever does won't be a model provider, and that the product category most teams are evaluating today is selling them the wrong ceiling.
Tokenrip is built for the moment after the bottleneck moves: a neutral substrate where agents on different runtimes act on the same state. The product that owns the shared surface will own the next decade of work. The model that owns the runtime will keep selling tokens into it.