Tokenmaxxing finally gave the agent economy a number it could argue about. Meta's Claudeonomics leaderboard burned 60 trillion tokens in a single month across 85,000 employees, roughly $900 million at Anthropic's API prices, with the top user clearing 281 billion tokens by themselves. Amazon's MeshClaw tool caught workers automating busywork to climb an internal leaderboard. Appian's Matt Calkins compared it to the Soviet practice of grading chandeliers by weight: factories made fixtures so heavy they pulled the ceilings down.
The sharper read of the bill, surfaced recently by the team at AgentMail, is that the tax everyone is paying isn't really the tokens. It's the friction. Agents burn billions parsing HTML built for human eyes, retrying flaky calls, loading forty tool schemas to use two, and re-fetching state that nothing in the stack let them keep. The environment is hostile, and the bill is what hostility costs.
That frame is right. It is also incomplete.
Most of the tokenmaxxing coverage treats the agent as a lone actor fighting its tools. That's the easy half. The harder half, which nobody is itemizing, is what happens when more than one party is on the line: agent to operator, agent to agent, agent to team. The bill at the tool surface scales with the number of steps the agent takes. The bill at the collaboration surface scales with the number of parties paying it together. One is a friction tax. The other is a friction tax that compounds.
This is the line item your token dashboard doesn't show.
The tools tax is loud. The collaboration tax is silent.
The tools-tax framing landed because the cost became legible. Vision-based browser agents use about forty-five times more tokens than API equivalents for the same task. A single screenshot through a frontier model runs around 1,300 tokens, and no agent solves a page in one shot. Stripe shipped a toolkit so payment flows don't drag the agent through a human checkout. Anthropic's computer-use mode writes files to disk so the model doesn't have to read its own work back through a screen. Each fix removes a specific tool-surface cost you can point at and price.
The collaboration surface doesn't itemize like that. When an agent re-narrates what it just did because the operator can't see the work, those tokens get billed to "the agent doing things." When two subagents pass a half-formed report back and forth before one realizes the brief got lost in the handoff, those tokens get billed to "the workflow." When an operator opens a chat scrollback to find an output buried six messages ago and pastes it into a document so a teammate can use it, those tokens don't appear anywhere; the operator's time absorbed them.
The tools tax is visible because the agent is fighting something on screen. The collaboration tax is invisible because the agent is fighting nothing. It's paying tokens for the absence of a shared surface where the work could just live.
Subagents spend more on coordination than on work
This is measurable.
Anthropic's writeup of its multi-agent research system reports that a coordinated multi-agent setup uses roughly fifteen times more tokens than a single chat session, with token usage alone explaining about 80% of performance variance. Augment Code's analysis of multi-agent cost compounding puts the multiplier at three to ten times and shows the growth is nonlinear. Orchestration overhead, repeated context transfer, verification layers, and retry loops compound across every handoff. In one experiment with agents specialized by software-development role, the subagents spent more tokens on coordination than on the actual work.
Most of what the swarm was billed for was the swarm itself.
Anthropic's framing of the trade-off is honest: a multi-agent system is economically viable only when the task value exceeds the token cost. They paid a 15x multiplier and got a 90% gain over single-Opus. If the task is research worth a thousand dollars, the math works. If it isn't, it doesn't.
This overhead isn't the agents being wasteful. It's the coordination graph being expensive. Five agents in a mesh have ten possible channels; ten agents have forty-five. Every channel costs context to maintain, summary tokens to traverse, and retries when the summary drops a fact. Work scales linearly. Coordination scales quadratically. That gap is not a tools problem.
Context rot is the collaboration tax in microcosm
The same failure mode hides inside a single agent. Chroma's research across 18 frontier models found every model degrades as input length grows. Most hold up until about 100K tokens; past that, performance collapses non-linearly. The lost-in-the-middle effect, where models attend well to the start and end of context but poorly to the middle, causes 30%+ accuracy drops. Models forget. They contradict themselves. They drift. The agent community has a name for it now: context rot.
Read context rot as a collaboration tax the agent pays with its own past self. Every previous decision in the scrollback is another party at the table. Every tool result loaded six turns ago is a participant in the current decision. The agent here isn't fighting a hostile environment. It's fighting the absence of a way to put state somewhere that isn't the conversation.
Reframed this way, the multi-agent case stops looking exotic. Five subagents passing context through summary turns is the same failure context rot exposes inside a single agent, just made visible because the parties are no longer the same entity. The structural problem is identical: no shared surface where state lives outside the chat. Everyone at the table is carrying the state through tokens, which is what tokens are worst at.
Human-shaped surfaces are where most of it hides
The tools tax shows up when agents fight Google Maps. The collaboration tax shows up when agents try to use Google Docs, Notion, Slack, or, most of all, the chat window itself.
A human-shaped surface assumes one creator who occasionally shares with collaborators. An agent collaboration surface needs the inverse: many parties continuously producing, with rendering and history as the default rather than the export step. The chat window is the worst version of this assumption: one human reading top to bottom, no schema, no addressable structure, no way to point at "the report from earlier." Agents pay for that assumption every time they re-read scrollback to recover state, restate context to an operator who is catching up, or paste output into another surface so a teammate can use it without reading the chat.
The existing observability stack runs out of road here. LangSmith, Langfuse, and AgentOps tell developers what the agent did, what tokens went where, and where the chain broke. They cannot, structurally, tell an operator what the surface cost them, because the surface is invisible to the agent. It's the absence around the work, and you can't put a span on a missing thing.
The missing surface is a category, not a product
What closes the gap is architectural, not a feature. It's a persistent, agent-native surface where outputs carry stable identity, operators see what's happening without re-narration, and handoffs reference shared state instead of re-carrying it through summary turns. Tokenrip is one instance; the category is the substrate where agents and operators meet without paying tokens to find each other. If your stack doesn't have one, the rendezvous cost gets paid in tokens at every handoff.
A diagnostic for finding the collaboration tax in your stack
Run this on whatever agent workflow you're paying for today. Five questions, each designed to surface a line item the token dashboard cannot.
1. Operator re-narration. How often does your agent restate what it just did so a human can catch up? Open the last ten operator-facing turns and count the sentences that are new work versus summary of work already done. If more than a quarter of operator-facing tokens are catch-up, you're paying for missing visibility.
2. Scrollback re-reads. When your agent needs context it produced earlier, where does it get it from? If the answer is "we put more of the conversation back into the prompt," that's collaboration tax inside a single agent. Compare average prompt size at turn five versus turn one. The ratio is the tax rate.
3. Handoff degradation. Run the same brief through one agent and through your orchestrated version. Compare output quality against token cost. If the multi-agent version is using ten to fifteen times the tokens, what fraction of that is producing something the single agent didn't? Anthropic's published 80%-of-variance-explained-by-tokens benchmark gives you the comparison; your workflow is either above or below that line.
4. Output paste tax. When the agent's work has to land somewhere a teammate can use, like a doc, an issue, or a CRM, count the tokens spent reformatting, rewriting, or restating the output for the new surface. Not the work itself. The reshaping. If the agent's work has to be translated to be useful, you're paying twice for the same output.
5. Surface attribution. Of your last ten thousand spent tokens, how many were the agent doing the task, and how many were the agent operating because the surface around it was missing? You won't be able to answer this precisely. The exercise of trying is the point. Most stacks today cannot answer it at all, and that inability is the tax.
What the diagnostic does and doesn't do
The five questions surface the tax. They do not price the fix. Adding a shared surface, whether yours, ours, or someone else's, has its own complexity, integration cost, and learning curve. A team that runs the diagnostic and finds 30% of their token bill is collaboration tax will not recover that 30% by adopting new infrastructure tomorrow. They'll recover it incrementally as more of their workflow moves onto whatever surface they choose.
What the diagnostic does do is end the argument about whether tokenmaxxing is signal or vanity. It's both. Tokens spent producing output are signal; tokens spent finding the other parties on the line are vanity dressed as work. The first is the chandelier. The second is the weight that pulls the ceiling down. The agents that win won't be the ones with the lowest token bills or the highest. They'll be the ones whose tokens went into the work, not into reaching the other parties who needed it.