There’s a stat making the rounds that should get your attention: AI agents failed at 96% of real-world jobs in a major new benchmark study.
If you’ve been nervous about AI, that might feel like relief. If you’ve been bullish, it might feel like a letdown.
Both reactions miss the point. For small business owners, this research — combined with what’s actually shipping in the tools you already use — paints a much more useful picture than either the hype or the panic.
What the Research Actually Found
The Remote Labor Index, published by Scale AI in October 2025, is one of the most rigorous benchmarks of AI’s ability to do real work. Not toy problems. Not chatbot demos. Real freelance projects sourced from Upwork — 240 of them — spanning 23 job categories: game development, product design, data analysis, architecture, animation. The kind of work people actually get paid to do.
They gave AI agents full autonomy to complete each project. The best-performing agent finished just 2.5% of them to an acceptable quality level.
These weren’t quick tasks, either. The average project required nearly 29 hours of human work and involved the kind of integrated judgment, context-switching, and creative problem-solving that current AI simply can’t replicate end-to-end.
The takeaway isn’t “AI is useless.” It’s that AI can’t replace the whole job. It can’t manage the full arc of complex professional work — not yet, and probably not for a while. That distinction matters enormously for how you should think about adopting it.
The Perception Gap
Here’s where it gets more nuanced. A randomized controlled trial by METR — published in July 2025 — studied experienced open-source software developers working on their own codebases. These are people who know their projects inside and out.
The finding: developers using AI coding tools were actually 19% slower than when they worked without them.
The uncomfortable part? Those same developers estimated they were 20% faster.
The people using AI thought it was helping. The stopwatch said otherwise.
This isn’t because the tools are bad. It’s because complex, context-heavy work — the kind where you need to hold an entire system in your head — doesn’t decompose cleanly into prompts. The overhead of managing AI suggestions, correcting its assumptions, and integrating its output into existing work ate up more time than it saved.
For small business owners, this is a critical lesson: where you apply AI matters more than which AI you pick.
The Split That Should Shape Your Strategy
Anthropic has been publishing an Economic Index tracking how people actually use AI in real work. Their January 2026 report found something that should shape how you think about adoption:
52% of AI use is augmentation. 45% is automation.
Augmentation means a person is using AI to think better, draft faster, explore options, and validate decisions — the human stays in the loop. Automation means AI is handling the task start to finish.
Only about 4% of jobs had AI handling 75% or more of their tasks. The vast majority of real-world AI use looks like a skilled person working alongside a capable tool — not a robot replacing a human.
Here’s the other finding that matters: tasks requiring higher education showed greater time savings (roughly 12x speedup) compared to simpler tasks (roughly 9x). But success rates dropped as complexity increased — from about 70% on straightforward work to 66% on more complex tasks. AI gives you more leverage on harder work, but it’s also less reliable there. That tradeoff is the whole game.
Meanwhile, Your Spreadsheet Got Smarter Overnight
While the research community was publishing studies on what AI can’t do, Anthropic was quietly shipping something that shows what it can.
In late January 2026, Anthropic made Claude available inside Excel for all Pro subscribers. Then on February 5th, alongside a major model upgrade, Claude launched inside PowerPoint. Not as a chatbot in a sidebar — it reads your existing data, understands your tab structures, writes and debugs formulas, builds pivot tables, and generates slide decks that use your actual templates, fonts, and colors.
The Excel integration isn’t just about formulas. Anthropic built financial data connectors with Moody’s, the London Stock Exchange Group, and other institutional platforms — authenticated, structured data feeds that let the model pull real numbers into your spreadsheets. They also shipped pre-built financial workflows: comparable company analysis, discounted cash flow models, due diligence data packs. These aren’t templates you fill in. They’re intelligent workflows that understand how to structure the assumptions, link the cells, and build the supporting tabs.
Here’s what makes this relevant to the “AI fails at 96% of jobs” conversation: none of this is autonomous. A person describes what they need — a three-year operating model, a board deck, a competitive analysis — and AI handles the mechanical execution. The formulas, the formatting, the cross-referencing between tabs. The human provides the judgment about what to build and whether the output is right.
That’s augmentation. And it’s the exact pattern the research says actually works.
What makes this different from previous AI product launches is the upgrade cycle. When Anthropic shipped a new model on February 5th, every Claude-powered Excel and PowerPoint installation got smarter overnight. Nobody installed anything. Nobody downloaded a patch. The spreadsheet looked the same, but the reasoning improved — better context, fewer errors, deeper analysis. And that cycle repeats every few months.
Execution Is Getting Cheap. Judgment Isn’t.
These findings, taken together, point to a shift that should change how you think about your business.
For decades, professional value was built on execution skills. Can you build the spreadsheet? Can you structure the analysis? Can you format the presentation? Those skills created careers. They’re what hiring managers screened for and what universities taught.
That execution premium is eroding — not because AI can do everything (the Remote Labor Index proved it can’t), but because AI can handle the mechanical parts of knowledge work fast enough that execution alone stops being a differentiator.
What isn’t eroding is judgment. Knowing which analysis matters. Knowing which assumptions to stress-test. Knowing when a technically correct model is answering the wrong question entirely. Knowing that a 40-slide deck isn’t as valuable as a 10-slide version that tells the right story.
For small business owners, this is actually liberating. You’ve always had the judgment — you know your market, your customers, your operations. What you’ve lacked is the execution bandwidth. AI is starting to close that gap.
But there’s a flip side. When production costs collapse, it becomes just as easy to generate polished work that looks professional and says nothing — what BetterUp Labs and Stanford researchers call “workslop”. The same capability that lets a thoughtful owner produce a day’s work in 30 minutes lets a careless one produce a week’s worth of polished nothing in an afternoon.
AI amplifies whatever it’s pointed at. If it’s pointed at clear thinking and real business knowledge, it amplifies your effectiveness. If it’s pointed at vague prompts and unexamined assumptions, it amplifies the noise.
What This Looks Like for a Small Business
If you’ve been sitting on the sidelines because the AI conversation feels like it’s either “replace everything” or “it doesn’t work,” here’s the reframe:
AI is a leverage tool, not a replacement. And leverage is exactly what small businesses need most.
You don’t have a department to throw at every problem. You probably don’t have an IT team. You might be the operations manager, the sales lead, and the quality control department all at once. The AI use cases that matter for you aren’t the flashy autonomous-agent demos. They’re the ones that give you back time on tasks you do every day.
Here’s what augmentation looks like in practice:
- Financial modeling: You need a three-year operating model to take to the bank for a small business loan. Instead of struggling through Excel or paying a consultant, you can describe your revenue targets, headcount plan, and cost structure — and AI can build you a working model to review, refine, and present.
- Proposal generation: You’re not asking AI to run your sales process. You’re asking it to take your notes from a site visit and draft a proposal in your format, with your pricing, so you can review and send it in a fraction of the time.
- Customer follow-up: You’re not automating your relationships. You’re using AI to draft follow-up emails after service calls so nothing falls through the cracks during your busiest weeks.
- Documentation: You’re not replacing your field expertise. You’re capturing it — turning the procedures that live in your head into written SOPs so your team stops calling you with the same questions.
- Data cleanup: You’re not handing your books to a bot. You’re using AI to categorize transactions, flag anomalies, and prep reports so your bookkeeper (or you, at 11 PM) can focus on the judgment calls.
None of these require AI to handle the entire job. They require it to handle the portion that’s mechanical, time-consuming, and low-judgment — so you can focus on the parts that actually need you.
The Readiness Problem Is the Real Problem
Here’s what the “AI fails at 96% of jobs” framing misses: most businesses aren’t struggling because AI isn’t good enough. They’re struggling because they aren’t ready to use what’s already available.
The Anthropic data shows that success rates drop as task complexity increases. That gap compounds — and the reason is that complex tasks require clean inputs, clear processes, and well-defined outputs. If those don’t exist, AI can’t help. Not because it’s incapable, but because it has nothing structured to work with.
And these tools are only getting better. Every few months, a new model ships and every AI-powered tool gets an automatic upgrade. The task AI couldn’t handle last quarter might be straightforward today. Your assumptions about what’s possible are likely already behind reality.
This is the same readiness gap we’ve written about in making your business AI-ready and the capability overhang. And it extends beyond operations — most businesses using AI haven’t written the policies governing how it’s used, either. The tools are already more capable than most small businesses realize. The bottleneck isn’t the technology. It’s the operational foundation.
If your processes aren’t documented, AI can’t follow them. If your data lives in text threads and sticky notes, AI can’t analyze it. If your service workflows change based on who’s working that day, AI can’t standardize them.
Getting ready for AI doesn’t start with picking a tool. It starts with knowing how your business actually runs.
Three Takeaways
1. Stop waiting for the “right” AI tool. The research is clear: AI isn’t going to handle your job end-to-end anytime soon. But it’s already saving meaningful time on specific, well-defined tasks — and it’s getting better every few months, automatically. The gap isn’t capability — it’s knowing which tasks to target.
2. Start with augmentation, not automation. The roughly 52/45 split exists for a reason. The highest-value use cases right now are the ones where AI makes a skilled person faster and more consistent — not the ones where it tries to replace them. If you’re a three-person shop, making each person meaningfully more effective can be transformative.
3. Invest in judgment, not just tools. AI can build the spreadsheet, draft the proposal, and format the presentation. What it can’t do is tell you whether you’re solving the right problem. The businesses that thrive in this environment won’t be the ones with the best AI tools — they’ll be the ones with the clearest understanding of their own operations, customers, and goals. Fix your operations first, and the tools will have something worth amplifying.
Where This Connects
This is the work we do at Moser Research. Not chasing the latest AI demo — building the operational foundation that makes AI actually useful for small businesses.
Our Operations Audit maps how your business actually runs today: where the bottlenecks are, where time disappears, and where AI can realistically help. Our Business Automation service implements targeted solutions for the specific tasks where AI delivers real returns — not hypothetical ones. And our Reliability Retainer keeps those systems running and evolving as the tools improve.
The headlines will keep swinging between “AI will replace everyone” and “AI doesn’t work.” The reality is quieter and more useful: AI is a powerful tool that rewards preparation. The businesses that treat it that way will outpace the ones still waiting for the hype to settle.
Ready to figure out where AI can actually help your business? Let’s talk about it.
This post draws on publicly available research including the Remote Labor Index (October 2025), METR’s developer productivity study (July 2025), and the Anthropic Economic Index (January 2026). Specific outcomes for your business will depend on your existing processes, infrastructure, and implementation approach.
Ready to get started?
Let's discuss how we can help systematize your operations.
Book a Free Discovery CallRelated Articles
The Best Engineers Are Artists
The best engineers don't just solve problems — they make elegant solutions. The same instincts that make a great bassist make a great engineer: listening first, serving the song, knowing when not to play. Research suggests the connection runs deeper than metaphor, and the companies that understand this dramatically outperform those that don't.
You're Not Locked In: How to Actually Get Value from AI in 2026
AI platforms aren't interchangeable brands. They're different tools with different design philosophies. Most businesses either pick one and use it wrong, or get paralyzed by choice. Here's how to stop doing both.
Your Business Is Using AI. Nobody Wrote the Rules.
A widely cited survey suggests roughly two-thirds of small businesses use AI regularly. Most have no written policy. That gap is the AI equivalent of running your LLC on defaults — and it's compounding every month.