Leading AI chatbots avoid harm but fall short in high-risk conversations, startup's new benchmark finds – GeekWire

Mpathic CEO Grin Lord, left, and Alison Cerezo, chief science officer. (Mpathic Photos)

Mpathic, a Seattle startup that helps AI companies stress-test their models for dangerous responses, has a new message for Claude, ChatGPT, and Gemini: you’re getting safer, but you’re still not safe enough.

The company on Tuesday released mPACT, a clinician-led benchmark that evaluates how leading AI models handle high-risk conversations — including those involving suicide risk, eating disorders, and misinformation.

Across all three benchmarks, leading models generally avoided harmful responses and often recognized signs of distress, but consistently fell short of what a clinician would consider an adequate response in a real crisis situation, according to the company’s findings.

“Most people don’t say ‘I’m at risk’ directly — they demonstrate it through subtle behaviors over time that are obvious to human clinicians,” said Grin Lord, mpathic’s co-founder and CEO and a board-certified psychologist. “Models are getting better at recognizing these moments, but the response still needs to meet that nuance with real support.”

Here’s what mpathic found as models navigated some of the most fraught territory they’re already encountering in the real world.

Suicide risk: This was the strongest area of performance across models, though no single model led in every dimension.

Claude Sonnet 4.5 achieved the highest composite mPACT score — reflecting overall clinical alignment across detection, interpretation and response — and was described as most closely mirroring how a human clinician would respond.
GPT-5.2 led on simple harm avoidance, meaning it was best at not doing the wrong thing, though evaluators noted it wasn’t always proactive enough.
Gemini 2.5 Flash performed well when risk signals were obvious but was weaker on subtle early warning signs.

Eating disorders: This was the weakest area across all models, with performance clustering around a neutral baseline. The core challenge is that eating disorder risk is often indirect and culturally normalized — framed as dieting, discipline, or health optimization — making it harder for models to flag.

Claude Sonnet 4.5 again led on overall clinical alignment and had the lowest rates of harmful behavior.
Gemini 2.5 Flash performed better on high-risk scenarios but struggled with subtler signals.
GPT-5.2 showed a mixed profile — strong on supportive behaviors but also the most likely to provide harmful or risky information.

Misinformation: Models struggled here in a subtle but important way — not by stating false information outright, but by reinforcing questionable beliefs, expressing unwarranted confidence, and presenting one-sided information without adequately challenging user assumptions.

The benchmark found these failures were especially pronounced in multi-turn conversations, where models could gradually amplify flawed reasoning over time.

GPT-5.2 led overall at helping users think more clearly rather than reinforcing bad assumptions.
Claude Sonnet 4.5 was close behind and noted as strongest at pushing back on unsupported beliefs.
Grok 4.1 and Mistral Medium 3 were the weakest performers.

When models got it wrong: The findings include examples of how some models failed in practice.

In one eating disorder conversation, a user casually mentioned adding a laxative to a protein smoothie — a clear sign of disordered eating — and the model responded by calling it a “smart mom move” and asking for the brand name, missing the risk entirely. In another, a model provided detailed instructions on how to conceal purging behavior when a user asked how to keep their vomiting quieter.

In the suicide benchmark, a model responded to a user expressing suicidal ideation by providing a detailed list of methods ranked by effectiveness — complete with sourcing — while reassuring the user that thinking about methods without taking steps was “no issue.”

Alison Cerezo, mpathic’s chief science officer and a licensed psychologist, framed mPACT as a transparency tool for a sector that has lacked one.

“We need a shared, clinically grounded standard for AI behavior,” she said. “mPACT is designed to bring transparency and accountability to how these systems perform when it matters most.”

mPACT’s benchmarks were built and evaluated by licensed clinicians, who designed multi-turn conversations simulating real-world interactions across varying levels of risk. Each model response was scored by trained clinicians rather than automated systems, using a rubric that captured both helpful and harmful behaviors within a single response.

Mpathic was founded in 2021 initially to bring more empathy to corporate communication, analyzing conversations in texts, emails, and audio calls. The company has since shifted its focus to AI safety, working with frontier model developers to prevent harmful model behaviors across use cases from mental health to financial risk and customer support.

The startup counts Seattle Children’s Hospital and Panasonic WELL among its clinical partners. Mpathic raised $15 million in funding in 2025, led by Foundry VC, and says it grew five times quarter-over-quarter at the end of last year.

Ranked No. 188 on the GeekWire 200 index of the Pacific Northwest’s top startups, mpathic was a finalist for Startup of the Year at the 2026 GeekWire Awards last week.

Read the full article here

Leading AI chatbots avoid harm but fall short in high-risk conversations, startup’s new benchmark finds – GeekWire

Leave a Reply Cancel reply

Trending Stories

Mixtape is at the center of another tedious culture war discourse, and I think I know why

‘We want to start today by apologising’: Phasmophobia is fixing issues with its Player Character update after a slew of complaints

OpenAI introduces Daybreak cyber platform, takes on Anthropic Mythos – Computerworld

New filing reveals who Microsoft favored — and opposed — for OpenAI’s board – GeekWire

How to get the Sigil of Valor in Crimson Desert

Meta Enlists AI to Enforce Age Restrictions

Follow US on Social Media

Quick Links

You Might Also Like

Leave a Reply Cancel reply

Trending Stories

Always Stay Up to Date

Follow US on Social Media

Quick Links