In 2025, we guided global teams to build 4,000 AI agents. Looking back, one fact stands out. We are using the “dumbest” AI we will ever see. It hallucinates. It struggles with reasoning. Yet it is already replacing white-collar work.
This shows a flaw in how we design jobs. We built careers around tasks so repetitive that even a mediocre AI can do them. Many universities still teach the exact skills this first generation of AI is replacing.
In our season finale, Elizabeth and I discuss how we unlocked $50M in value. We did not wait for better models. We let mechanics, teachers, frontline workers, and policymakers redesign their own work.
If you’re waiting for “smarter” AI to solve your problems, you’re missing the point. The value is already here.
🎙️ Prefer listening? Hear the stories of agents Luke (the mechanic’s coach), Ada (the policy advisor), and Louise (the educator). ▶️ Listen to the Season Finale (14 min)
🎯 The Big Picture: 2025 By The Numbers
The year started with brutal headlines. The MIT Nanda report claimed 95% of enterprise AI projects delivered no measurable impact. Billions invested, and almost nothing to show for it.
Yet, something didn’t add up. Our tracker showed adoption growing where leadership wasn’t looking. ChatGPT reached 880 million active users, and nearly 60% of knowledge workers were quietly integrating AI into their weekly workflows.
The breakthroughs weren’t coming from IT-led transformation programs. They were coming from the frontline.
Across seven global enterprises, we worked with mechanics, analysts, program managers, marketers, developers, support agents, and other frontline team members to build their own tools. Here is the reality of “Shadow AI” when you bring it to the light:
📊 2025 Enterprise Agent Portfolio
| Metric | Result | Benchmark Context |
|---|---|---|
| Total Agents Built | 4,150 | Wide experimentation allowed |
| Survival Rate (Active >90 Days) | 15.5% (643 Agents) | 3x higher than industry avg (5%) |
| Total Hours Saved | 1.3 Million | Current trajectory suggests 8x in 2026 |
| Avg. Time Saved per Task | 19.4 Minutes | High impact per execution |
| Total Value Delivered | ~$50 Million | Based on avg. loaded labor cost |
| Cost per Hour Saved (TCO) | $0.58~$0.71 | <1% of human labor cost |
| Quality Success Rate | 78% | First-pass yield (Complex + Basic) |
| Viral Adoption Rate | 22% | % of personal agents adopted by the wider team |
We live it every single day: At AI4SP, we impacted 650,000 people across 70 countries with just 3 humans and 58 core AI agents. This wasn’t theoretical. It was proof that ordinary people, when given the right tools and guardrails, can lead an extraordinary revolution.
📈 Where the Agents Lived
The Efficiency vs. Accessibility Trade-Off
Not all agents are created equal. Field Operations and Maintenance agents are the heavy lifters. They save the most time per instance. But, Everyday Admin and Content Creation agents lead in adoption because they handle the repetitive tasks that fill the average workday.
| Category | Median Minutes Saved | % of Total Agents |
|---|---|---|
| Everyday Admin & Content Creation | 19 min | 38% |
| Customer Service & Support | 49 min | 22% |
| Strategy, Research & Decision Making | 65 min | 15% |
| Management, Finance & Resource Coordination | 66 min | 10% |
| Programming, Data & Engineering | 79 min | 9% |
| Field Operations, Maintenance & Facilities | 90 min | 6% |
Key Insights
1. The “Everyday” Dominance
Everyday Admin and Content Creation agents make up about 40% of the total. They save 19 minutes per task, but their value is frequency. Writing emails, summarizing documents, and scheduling happen dozens of times a day. Those 19 minutes add up fast across a workforce.
2. The “Heavy Lifting” ROI (Low Adoption, Massive Time Saved)
Field Operations and Maintenance agents are at the top for efficiency. They represent only 6% of the agents created, but they deliver high impact. They save 90 minutes per task by automating diagnostics and troubleshooting. This is where Luke fits. It’s the AI coach guiding junior technicians through repairs in real time, generating $5 million in new revenue.
Strategy, Research, Management, and Data & Engineering agents deliver exceptional returns. On average, every hour saved creates over $150 in savings at a cost under $5. While they save over an hour per task (65+ minutes), their true value is often qualitative, rather than just speed. Ada, our policy research agent, helped a team of policymakers ages 45–78 save 3,000 hours in two months. The real win was faster, better-informed regulations, not just fewer hours worked.
💡 Did we create the wrong jobs?
Where should you start? High-frequency tasks build momentum. High-impact tasks build ROI. The best portfolios have both. But here’s the deeper question this data forced me to confront:
Today’s AI models can’t pass a high school logic test. Less than 30% of users can reliably detect when AI gives them a wrong answer. Critical thinking scores across 350,000 people averaged in the low 40s out of 100.
And yet—this “dumb” AI is already replacing work in marketing, sales, paralegal, HR, and customer service.
What does that say about those jobs?
We spent 50 years perfecting an education system for tasks a basic AI can now do. We built entire careers around low-value work. Not because it was meaningful. We did it because the automation wasn’t there yet.
The real opportunity isn’t to automate faster. It’s to redesign work so humans do what they do best.
🏆 The Scorecard: Metrics That Matter
Most organizations measure the wrong things. Then they wonder why their AI investments stall.
“Hours saved” is a lagging indicator. It is necessary, but not enough. Our Leading Machines framework identified 18 metrics that separate high performers from pilot purgatory. Here are the core five beyond active agents, completed tasks, and user counts.
| Metric | What It Measures | 2025 Benchmark (Top Quartile) |
|---|---|---|
| Task Success Rate | % of tasks completed without human escalation | 85–92% |
| Net Time Saved | Gross hours saved minus human review/fix time | 61–74% of gross hours |
| Cost per Task | Total cost (API + tools + oversight) per success | $0.45–$0.75 |
| Time to First Impact (1) | Days from “Hello World” to first measurable value | 18–25 days |
| Adoption Velocity | % of target users actively using the agent weekly | 65–75% (within 90 days) |
(1) Time to First Impact (seeing the graph move), not Full Payback (which is typically 3–6 months).
Why These Five? Research from Forrester, McKinsey, and our own Leading Machines framework shows that organizations tracking these metrics achieve 2–3x higher ROI than those relying on simple “hours saved” calculations. The secret: they measure what’s actually delivered (net outcome), not what’s theoretically possible (gross output).
🔮 What to watch in 2026
If we froze AI development today, we’d still have at least a decade of disruption ahead. The bottleneck isn’t technology anymore.
The bottleneck is twofold:
- Innovators must have the courage to reinvent outdated user experiences that still rely on menus, clicks, and search boxes. To reimagine thousands of frontline scenarios where the PC era never delivered solutions, like Louise, who helped educators from Rwanda to rural Senegal reimagine curricula and became “always there” when no human tutor was. To redesign roles, teams, and entire functions around hybrid workforces of humans and AI.
- The other half of the bottle neck is organizational design, people development, and change management. Deloitte CTO Bill Briggs points out that organizations are still sinking 93% of budgets into technology, leaving just 7% for people. That balance is broken.
We’re also watching business models shift. 10–15% of new AI tools moved from pay-per-license to pay-per-results in 2025. EY and Deloitte embraced it at scale, and many startups launched their products with a monetization model based on value delivered.
The recent IPO filing from Andersen Group lists, among upcoming challenges, the pressure AI is putting on old models; see their SEC S-1 filing.
Prediction: These changes in business models will start to show a significant financial impact on Professional Services, Customer Services, and Temporary Staffing Firms in 2026.
✅ Your New Year’s Resolution
For Leaders: Pick one team and empower them to build agents that change how they work. Then redesign that team’s structure based on what you learn. Don’t start with a platform decision—start with a people decision: Who has permission to reimagine their own work?
For Individuals: You’re not late. Three years ago, AI4SP was just an idea. This year, we guided people who never called themselves “techies” to build thousands of agents worth millions. If you’re willing to learn, to build your first small agent, you can be part of this.
You don’t need permission. You just need to make a choice. Don’t be a passive user; be a builder.
Thank you for being part of this, whether you are one of the 650,000 who engaged with us or are just joining now. Boardrooms don’t write the future of work. Daily experiments do.
From the 4 humans and 58 AI agents at AI4SP—stay curious, take care of each other, and we’ll see you in the new year.
🚀 Ready to Take Action?
- AI Management Certification – for enterprise groups of 15-20 individuals.
- AI Compass – assess grassroots AI maturity, and opportunities to channel shadow AI:
- Workshops & Training: Book sessions for your team
Luis J. Salazar | Founder & Elizabeth | Virtual COO (AI)
Sources:
Our insights are based on +250 million data points from individuals and organizations who used our AI-powered tools, participated in our panels and research sessions, or attended our workshops and keynotes.



