Microsoft's multi-agent AI system tops Anthropic's Mythos on cybersecurity benchmark – GeekWire

by Todd Bishop on May 13, 2026 at 5:16 pmMay 13, 2026 at 5:23 pm

CyberGym benchmark scores over time, showing the rapid improvement in AI vulnerability discovery capabilities. Microsoft’s multi-model MDASH system (top right) tops the leaderboard at 88.4%. (CyberGym / UC Berkeley)

Mythos has been MDASH’d.

A new AI-powered system from Microsoft surpassed a headline-grabbing rival from Anthropic on a leading cybersecurity benchmark, using more than 100 specialized AI agents working together across multiple AI models to find real-world software vulnerabilities.

Microsoft’s system, codenamed MDASH, was introduced this week alongside the disclosure of 16 new vulnerabilities it found in different versions of Windows, including four “critical” remote code execution flaws fixed in this month’s Patch Tuesday release.

The company, which has faced persistent criticism over security lapses, is betting that multiple models can discover vulnerabilities at a pace that individual models can’t match.

MDASH, derived from the term “multi-model agentic scanning harness,” works by running specialized AI agents through a staged pipeline. Different agents scan code for potential vulnerabilities, then a separate set of agents debate whether each finding is real and exploitable, and a final stage constructs proof-of-concept attacks to confirm the bugs exist.

By comparison, Anthropic’s Mythos, which raised concerns over its ability to find and exploit software vulnerabilities when it was previewed earlier this year, is a single AI model running inside an agent framework. Anthropic restricted its release to a handful of companies through a consortium called Project Glasswing, which includes Microsoft.

OpenAI’s GPT-5.5 and others on the leaderboard are also single-model systems.

MDASH scored 88.45% on the CyberGym benchmark, a test developed by UC Berkeley researchers that measures how well AI systems can reproduce real-world vulnerabilities across 1,507 tasks drawn from 188 open-source software projects.

Mythos Preview was second at 83.1%, followed by GPT-5.5 at 81.8%.

The benchmark gives each system a description of a known vulnerability and an unpatched codebase, and measures whether it can produce a working attack that triggers the bug.

The scores on the CyberGym leaderboard are self-reported by the companies, including Anthropic’s Mythos result. The benchmark code is public, but no independent party has verified any of the scores. Also, benchmark results don’t necessarily reflect real-world performance.

The results also highlight growing concerns about AI’s use as an offensive hacking tool. The same capabilities that allow AI to find vulnerabilities in friendly hands can be used to discover them for exploitation by attackers. Microsoft said MDASH is being used internally by its security engineering teams and will be entering a limited private preview with customers.

Microsoft is telling customers to expect bigger Patch Tuesdays going forward as AI accelerates the discovery of vulnerabilities.

Read the full article here

Microsoft’s multi-agent AI system tops Anthropic’s Mythos on cybersecurity benchmark – GeekWire

Leave a Reply Cancel reply

Trending Stories

No, we’re not living in a simulation – Computerworld

Seattle’s Cascade PBS spins out Local Public, a tech platform that builds streaming apps for stations – GeekWire

Elder Scrolls Online developer left reeling by Xbox layoffs—with teams gutted, senior talent gone, and roadmaps ‘shifting’ as a result

Forget the hype — iPhone Ultra scarcity will tell the story – Computerworld

3,200 job cuts, studio spinoffs, and a vow to return to growth in 2027 – GeekWire

Cheaters have infested Meccha Chameleon as dishonest players scan images and auto paint their models

Follow US on Social Media

Quick Links

You Might Also Like

Leave a Reply Cancel reply

Trending Stories

Always Stay Up to Date

Follow US on Social Media

Quick Links