In April, Amazon engineers discovered that their AI usage was being tracked, scored, and ranked on an internal leaderboard called MeshClaw, with a soft mandate that more than 80% of engineers should use AI tools weekly. Three weeks later, on May 15, 2026, HashiCorp co-founder Mitchell Hashimoto posted what would become one of the most-shared tech tweets of the quarter: "I strongly believe there are entire companies right now under heavy AI psychosis." Over a million engagements followed.
These two stories are the same story. The leaderboard creates the psychosis. The psychosis justifies the leaderboard. And the name we now use for the loop is tokenmaxxing — the act of inflating your AI usage score with activity that has no relationship to output. Engineers run unnecessary queries. Managers boast about prompt counts in board decks. Vendors price by the call instead of by the outcome. Tokenmaxxing is what happens when "AI productivity" becomes a vanity metric, and right now, half of US tech is doing it without realizing it has a name.
This piece is the case against tokenmaxxing, the diagnostic for spotting it inside your own team, and the four outcome-based metrics that replace it.
What Tokenmaxxing Actually Is
Tokenmaxxing has its own Wikipedia entry now. The term started as a /r/programming joke in late March 2026, hardened into reporting language at Tom's Hardware and TechRadar in April, and went fully mainstream after WBUR's On Point framed it as "tech workers gamifying their way to unemployment." Built In and InfoWorld have running explainers. Every B2B SaaS leader in San Francisco knows the word by now.
The textbook definition: tokenmaxxing is when employees perform AI usage — running queries, generating drafts, regenerating outputs, asking agents trivial follow-ups — primarily to satisfy an internal usage metric, not because the work needed doing. The deeper definition is more uncomfortable: tokenmaxxing is the inevitable outcome of measuring activity instead of outcomes inside a hype cycle where leadership has bet their job on the activity going up.
It is not a fringe problem. CNBC reported in May 2026 that a growing share of Fortune 500 companies are now tracking employee AI usage at the user level. BCG's AI Radar 2026 found that 72% of CEOs are now the main AI decision-maker — twice last year's share — and that half of them believe their job is on the line if AI doesn't pay off. When the CEO's job depends on AI ROI and the dashboard shows prompts per engineer, prompts per engineer go up. That's tokenmaxxing.
The MeshClaw Story Is the Tokenmaxxing Story
The Amazon reporting is worth understanding in detail because it is the canonical tokenmaxxing case. According to Fast Company, the internal tool MeshClaw tracks how often each engineer uses sanctioned AI tools, rolls the results into team-level and org-level rankings, and surfaces the leaderboard in front of skip-level managers. The 80%-weekly-active soft target is communicated as a "best practice." The 2,089-point Hacker News thread that broke the story has top comments from engineers describing exactly what you'd expect: queries are being run to clear the threshold, not to solve problems.
Fortune's coverage added the Wall Street angle. Analyst Gil Luria observed that the corporate logic — "a $500K engineer should burn $250K in tokens because the leverage ratio is obvious" — is increasingly a justification for capex commitments, not a productivity argument. The point is subtle and important. When the CFO has already signed a multi-billion-dollar GPU contract, the easiest way to show ROI is to drive consumption up. Tokenmaxxing turns engineers into the demand curve that justifies the supply contract.
HRD reported in May that internal HR teams at Amazon were the ones who built the leaderboard in the first place. This is the part that gets missed in the news coverage. Tokenmaxxing isn't an engineer-side abuse. It is a system that engineers are responding to rationally. If you tell knowledge workers that AI use is the metric, they will produce the metric. That's not a defect. That's the design.
Hashimoto's "AI Psychosis" Is the Same Mechanism
The Hashimoto tweet thread that broke a million engagements wasn't actually about tokenmaxxing — but it described the upstream cause. Hashimoto's argument: "these companies believe it's fine to ship bugs because the agents will fix them so quickly and at a scale humans can't do." The diagnosis is that leadership has confused "AI is fast at producing outputs" with "AI is fast at producing correct outputs," and the result is an internal narrative that justifies any quantity of AI work as automatically valuable.
Tokenmaxxing is the operational fingerprint of that narrative. If quantity of AI use is automatically valuable, then a usage leaderboard is automatically a productivity dashboard. If quantity of AI output is automatically valuable, then shipping more (and fixing it later with more AI) is automatically rational. The two beliefs reinforce each other, and the dashboard becomes the proof. This is the same pattern Pragmatic Engineer Gergely Orosz documented in his April analysis, and it is now showing up at every Fortune 500 with an AI mandate.
The most cited piece of evidence for the disconnect is the MIT NANDA project's 2026 study showing that 95% of enterprise GenAI pilots fail to deliver measurable P&L impact. McKinsey's State of AI 2026 found that 88% of organizations are deploying AI, but only 1% of C-suite leaders describe their rollout as mature, and only 19% report AI-accelerated revenue increases above 5%. The activity numbers are real. The outcome numbers are not. Tokenmaxxing closes the gap on paper. We covered the underlying mechanism in our analysis of why most AI pilots fail in 2026; tokenmaxxing is what happens when leadership refuses to accept those results.
Goodhart's Law Is Eating AI Adoption
Tokenmaxxing is the most public example of Goodhart's Law in the workplace since Wells Fargo's account-opening scandal. The original phrasing: "When a measure becomes a target, it ceases to be a good measure." Translated to AI: the moment "weekly active AI users" became the KPI, the KPI stopped being correlated with output.
This is what makes tokenmaxxing different from other forms of corporate theater. Most metric-gaming requires effort. Tokenmaxxing is frictionless. You open the IDE, you fire a useless query at the agent, the leaderboard updates. There is no countervailing cost to inflating the number, because token cost is absorbed by the central AI budget, not by the engineer's project. So the engineer's rational behavior is to inflate, especially under a mandate. Spark & Sterling's 2026 vanity-metrics teardown makes the same point in marketing language: any AI metric that doesn't connect to a downstream business outcome is just expensive activity dressed as progress.
The companies winning the AI cycle understand this. They aren't tracking prompts per engineer. They're tracking shipped-features-per-engineer, customer-outcome velocity, and time-to-first-shippable-artifact. They've moved the goalposts from input to outcome — and they're seeing the 66.2% epic throughput gains DORA reported in its 2025 State of AI-Assisted Software Development report. The difference between the tokenmaxxing companies and the winning companies isn't budget. It's measurement.
The 5 Symptoms Your Team Is Tokenmaxxing
The reason tokenmaxxing is so hard to spot from the inside is that the leaderboard, the dashboards, and the standup updates all look healthy. Activity is up. Tool adoption is up. Senior leadership is celebrating. To diagnose tokenmaxxing, you have to look one layer underneath. These are the five symptoms we see across teams that have crossed the line.
Symptom 1: The AI Usage Dashboard Exists, But the AI Outcomes Dashboard Does Not
Most companies that fall into tokenmaxxing built the input dashboard first because it was easy — Copilot reports usage, Claude reports tokens, ChatGPT Enterprise reports prompts. Building the outcome dashboard requires defining outcomes, which is hard. So leadership ends up reviewing the easy dashboard and pretending it answers the hard question.
Symptom 2: The Bug Rate Trended Up While the Velocity Story Trended Up
This is the Hashimoto signal in operational form. When you read the engineering retro and the bug count is up 20% but everyone is celebrating "AI-accelerated shipping," you are inside a tokenmaxxing organization. The acceleration is real. So is the regression.
Symptom 3: Standup Updates List Prompts, Not Decisions
"I asked Claude to draft the migration plan" is a tokenmaxxing standup. "I closed the migration plan decision with the platform team" is an outcome standup. The first sentence shows AI activity. The second sentence shows the work that AI was supposed to enable. Listen for the difference.
Symptom 4: High-AI-Usage Engineers Are Not the Highest-Shipping Engineers
Pull the data from your project management tool. If your top quartile by AI usage doesn't match your top quartile by closed tickets, shipped features, or customer outcomes, you don't have an AI productivity story. You have a tokenmaxxing story. The correlation should be obvious. If it isn't, the metric is broken.
Symptom 5: Cost Per Token Is Rising Faster Than Revenue Per Feature
This is the CFO check. Notion just started metering Custom Agents at $10 per 1,000 monthly credits. Miro's AI Sidekicks are credit-metered. Loom's AI features are now Business-tier only. The whole industry is moving toward usage-based pricing, which means the cost of tokenmaxxing is no longer absorbed by the vendor — it's hitting your invoice. If your AI spend is up 4x and your shipped-feature throughput is up 1.2x, the equation is collapsing. We unpacked this in the AI credit pricing trap.
The Outcome-Based Replacement: Four Metrics That Beat Tokenmaxxing
Killing the leaderboard is the easy part. Replacing it is the work. The four metrics below are what we recommend to remote and hybrid teams who want to actually measure whether AI is paying off, and they map cleanly to a dashboard your CFO will respect.
Metric 1: Time-to-First-Shippable-Artifact (TTFSA)
Measure the wall-clock time from "task assigned" to "first reviewable deliverable in the hands of a teammate or customer." AI should compress this number. If TTFSA is flat or rising despite tokenmaxxing activity, your AI use is theater. The metric is independent of how many prompts were fired — it only measures whether the artifact arrived faster.
Metric 2: Decision-to-Execution Ratio
This is the number of decisions logged per week divided by the number of meetings held per week. Coordinated, AI-amplified teams generate more decisions per meeting because they offload the prep, the research, and the draft to an agent before the sync. Tokenmaxxing teams generate more meetings and the same decisions. We use this metric internally and wrote about it in the meeting debt framework.
Metric 3: Customer Outcome Velocity
Look at NPS movement, expansion revenue, support ticket reduction, or whatever your product's leading indicator is — and ask whether AI investment is moving that number. This is the only metric that ultimately matters, and it is the one most resistant to gaming. Tokenmaxxing dies the moment leadership starts asking "show me which customer outcome this prompt count produced."
Metric 4: AI-Amplified Throughput on the 20% Architecture Slice
Most engineering work doesn't need AI. The work that does is the 20% that is architectural, ambiguous, or cross-team. Measure AI's impact only on that slice — and you'll see who is using AI for leverage versus who is using AI to clear a leaderboard. We covered the underlying logic in our analysis of the AI code review bottleneck.
This is where the case for outcome-native collaboration tools gets specific. Tokenmaxxing thrives in stacks where AI activity happens in one tool, the work happens in another tool, and the decision happens in a third. Coommit was built so that the canvas, the video conversation, and the AI assistant all sit on the same surface — meaning the AI's output is visible as the artifact, not as a prompt count buried in a usage report. That doesn't make tokenmaxxing impossible. It just makes it visible, which is enough.
The Tokenmaxxing Detox: What to Do Monday Morning
If you read the above and recognized your own org, the playbook is short. Kill the usage leaderboard this week. Pull the top three highest-AI-usage engineers from last month and ask them privately to walk you through one shipped feature each — if they can't, you've confirmed tokenmaxxing. Swap one usage-based dashboard for one outcome-based dashboard (TTFSA is the easiest start). Reword your AI mandate from "weekly active users" to "demonstrate one AI-amplified outcome per quarter." And finally, stop measuring AI in your leadership KPIs entirely until you have an outcome metric to anchor it to. The signal you send by removing the vanity metric is more valuable than any dashboard.
The companies that survive the next 18 months of AI cycle will not be the ones with the highest prompt counts. They will be the ones who refused to confuse activity with productivity, who killed their tokenmaxxing dashboards before the CFO started asking why the AI bill doubled and the feature throughput didn't. Tokenmaxxing is the loudest signal we have that the industry has misread the moment. The fix is not less AI. It is more honest measurement of what AI is actually doing.