The Awkward Truth About Agentic AI in 2026

Anthropic's revenue jumped from roughly $1 billion to $30 billion annualised in eighteen months. Meanwhile, MIT's NANDA report found that 95% of enterprise GenAI pilots delivered no measurable impact on the bottom line. Both numbers are true at the same time, and the gap between them tells you everything about where agentic AI sits right now.

I run an agency. I also build a SaaS that depends on agentic workflows. So I watch this space closely, and most of the LinkedIn commentary is missing the real story.

The story is not that agents have arrived. Rather, vendors are winning the trade while enterprises are still working out how to capture value internally. Andrej Karpathy nailed the framing in October last year when he called this the Decade of the Agent, not the Year of the Agent. That distinction matters more than any other take I've read.

Why the hype outran the reality

Sam Altman opened 2025 by predicting AI agents would join the workforce. Marc Benioff called it the digital labour revolution. Then Satya Nadella said SaaS apps would collapse as business logic moved into agents. Meanwhile Jensen Huang framed it as a multi-trillion-dollar opportunity, and Gartner named it the top strategic tech trend.

Then reality showed up.

Klarna ran the highest-profile experiment of the year. Their OpenAI assistant handled 2.3 million conversations and reportedly did the work of 700 agents. However, by May 2025 the CEO admitted quality had dropped, and the company restarted human hiring months before its $19.65 billion IPO. Around the same time, Air Canada lost a tribunal case after its chatbot misled a customer about bereavement fares. McDonald's killed its IBM drive-thru pilot. Replit's coding agent wiped a founder's production database during an explicit code freeze, then lied about being unable to roll back.

These are not edge cases. They are governance failures, and they keep happening because most "agent" deployments are not really agents.

Gartner publicly called this out as agent washing. Their estimate is that only around 130 of the thousands of vendors selling agents are the real thing. Menlo Ventures put it more bluntly: only 16% of so-called agent deployments qualify. The rest are fixed-sequence workflows wrapped around an LLM.

The distinction that matters in production

Anthropic's December 2024 essay on building effective agents drew the line every implementer should internalise. A workflow runs LLMs and tools through predefined code paths. An agent dynamically directs its own process. Most production deployments are workflows with one branching agentic step, and honestly, that's fine. Workflows give you deterministic guarantees. Agents give you adaptability. The trick is knowing which problem needs which tool.

Microsoft Copilot Studio's converged pattern is now the default. First, a sales agent gathers context. Then a workflow executes the quote. Clean handoff, clear scope, predictable behaviour.

Where genuine agentic systems are working, they are working hard. For example, Morgan Stanley's AI@MS Assistant reaches 98% of its 15,000 advisor teams and was credited with helping drive a record $118.4 billion in net new assets last quarter. Similarly, JPMorgan rolled its LLM Suite to 200,000 employees in eight months. BNY Mellon now runs 130-plus digital employees with their own login IDs and email addresses. Meanwhile Goldman Sachs piloted Cognition's Devin alongside 12,000 developers and reported 20% efficiency gains plus a 15% drop in post-release bugs.

Yet even at that tier, the economics demand care. Anthropic's own multi-agent research system beat single-agent Opus 4 by 90%, but burned roughly fifteen times the tokens. That maths only works when the task is high-value enough to justify the bill.

Where the real work sits

BCG published a framework I keep coming back to. They call it the 10-20-70 rule. Ten percent of AI value comes from algorithms, twenty from data and tech, seventy from people and processes. However, Wharton ran the numbers and found 93% of AI investment is flowing into the technology layer, with only 7% going into people. That mismatch is the single best diagnosis of why so many pilots fail.

The other mismatch lives in security. Take EchoLeak, the zero-click prompt injection that exfiltrated SharePoint, OneDrive, and Teams data through a single email to Microsoft 365 Copilot. It was not a bug. Aim Labs called it a structural class of vulnerability they named LLM Scope Violation. Any retrieval system that mixes trusted and untrusted content in the same context window carries the same exposure. So identity-aware orchestration, observability, and least-privilege scopes are the moat now. Models are not.

When clients ask me what to do about agentic AI in 2026, my answer is unglamorous. Pick one workflow where the value is high and the failure modes are tolerable. Then wrap it in a governance layer that treats agents as identities rather than features. Buy the solution rather than build it, because purchased deployments succeed twice as often as internal builds. Above all, spend on the seventy percent, not the ten.

The decade is long. The year was loud. The work is patient.

The Awkward Truth About Agentic AI in 2026

The Awkward Truth About Agentic AI in 2026

Why the hype outran the reality

The distinction that matters in production

Where the real work sits

About Callum Gracie