AI vs. Judgment
A study in Nature Medicine found that OpenAI’s ChatGPT Health underestimated emergency severity in over half of cases—51.6 percent—sometimes recommending routine follow-up instead of urgent care. Researchers warned that the system can assist clinicians but should not replace professional judgment in emergency situations.
A survey by Confluent of 200 UK executives found that 62% of organizations are already shifting decision-making to AI systems. Seventy percent of leaders say they second-guess their own judgment when it conflicts with AI output, even as more than half report receiving less than five hours of AI training.
There’s also a visibility gap. Fifty-nine percent of executives believe employees use AI daily, while only 42% of employees report doing so. And while 23% of CEOs think staff are delegating full tasks to AI, only 8% say they actually are. Adoption is uneven and governance is thin. Fifty-two percent of mid-level workers report daily AI use compared to just 21% of junior staff, yet only 7% of organizations say they have a company-wide AI strategy.
AI is already shaping decisions inside organizations. Governance frameworks, in most cases, are not.
Agents vs. Oversight
Vendors are expanding AI agents across core work systems—and introducing the controls to manage them.
Microsoft is pushing agents deeper into everyday work. Its new Copilot agent capabilities allow tasks to be delegated across Outlook, Teams, and Excel using organizational data. At the same time, Microsoft is introducing new governance layers—like Agent 365 and expanded enterprise security bundles—to control how those agents operate.
Those controls are not included by default. They are being sold as additional enterprise security and governance layers, with new Agent 365 and Enterprise E7, priced tiers that separate the AI capability from the oversight required to manage it.
OpenAI is moving in the same direction. The company acquired Promptfoo, a startup focused on testing and red-teaming large language models. The goal is to evaluate agent behavior, identify vulnerabilities, and monitor compliance as these systems are deployed.
Zoom is expanding AI across its collaboration platform as well, embedding assistants into documents, presentations, and internal knowledge systems, and allowing users to query information across connected tools like Slack and Salesforce.
Different vendors, same pattern: expand AI agents across core workflows first, then introduce the governance layers needed to monitor and control them. The broader the agent footprint becomes, the more valuable the oversight layer becomes.
AI Reliability Gap
A recent study published by researchers Stephan Rabanser, Sayash Kapoor, and Arvind Narayanan highlights that while the capabilities of artificial intelligence agents have rapidly improved, their reliability remains a significant concern. The research indicates that despite advancements in AI, the industry lacks effective tools to measure reliability, which is crucial for user trust. For instance, the study evaluated fourteen AI models, revealing that while accuracy has improved considerably, reliability gains have been modest. This gap raises questions about the economic impact of AI technologies, suggesting that without a focus on reliability, the deployment of AI agents in critical applications may be hindered. The researchers propose a new “AI agent reliability index” to address this issue and stimulate further investment in reliability improvements.
And there’s market money flowing here. IntelliGRC has successfully raised $3.5 million in seed funding to enhance its governance, risk, and compliance platform tailored for managed service providers and managed security service providers. The company aims to utilize this funding to accelerate product development, expand artificial intelligence capabilities, and bolster its go-to-market strategies.
Why Do We Care?
Vendors deployed agents broadly, created a reliability gap they can’t fully close, and are now selling governance products to manage the risk they introduced. That’s not a conspiracy—that’s a business model. And MSPs are being invited to participate in it as resellers of vendor-controlled control planes.
The Confluent data showing 70% of executives second-guessing their own judgment against AI outputs isn’t a productivity story. It’s an authority transfer story. And the 2025 critical thinking research confirms the feedback loop: the more you defer to AI, the less capable you become of catching it when it’s wrong. That’s the slow-burn liability nobody is pricing into their service agreements.
The Nature Medicine finding—51.6% emergency severity underestimation—is the concrete proof point that reliability failures aren’t edge cases. They’re base rates. And the reliability index study makes it explicit: the industry doesn’t have the tools to measure this yet. That gap is where MSPs either build something defensible or wait for vendors to commoditize it.
The bad decision is straightforward. An MSP who resells Agent 365 as governance, without independent audit capability, is handing the client a vendor-controlled oversight layer for vendor-deployed agents. When the incident happens—and the Nature Medicine data says it will—the MSP’s recommendation is on record as the governance solution. That’s not a sales problem. That’s a liability problem. The MSPs who win here are the ones who build the measurement layer before Microsoft standardizes it and charges $99 a seat for it.
Vendors are monetizing the governance gap created by their own agent deployments, transferring operational and liability risk downstream to MSPs and clients while capturing margin on the mitigation layer.
What to Consider
- Conduct AI usage audits now, before clients ask. The perception gap (59% vs. 42% daily use) is your entry point. Offer a 30-day AI activity audit using existing log data from M365, endpoint tools, or identity platforms. Sell the truth, not the assumption. AI risk is a new QBR motion.
- Build a vendor-agnostic governance framework, not a Microsoft resale motion. Document client AI policies, access controls, and audit trails in a format that isn’t dependent on Agent 365.
- Position reliability testing as a recurring service, not a one-time assessment. The reliability study’s finding—that the industry lacks effective measurement tools—means clients have no baseline. Offer quarterly AI reliability reviews tied to specific workflows. Price it as a managed service, not a project.
- Do not deploy AI in clinical or high-stakes decision workflows without explicit written client acknowledgment of the reliability gap. The Nature Medicine data is now public record. Deploying AI-assisted triage, HR decisions, or financial approvals without documented client sign-off on known reliability limitations is a liability you’re absorbing silently.
If this trend continues, within 12 months major cyber insurers and enterprise procurement teams will require documented AI control evidence—agent permissioning, evaluation results, and audit logs—before renewing policies or signing MSAs, and vendors will sell the compliance bundle while MSPs operationalize it.
