SUBSCRIBE
Tech Journal Now
  • Home
  • News
  • AI
  • Reviews
  • Guides
  • Best Buy
  • Software
  • Games
  • More Articles
Reading: Leading AI chatbots avoid harm but fall short in high-risk conversations, startup’s new benchmark finds – GeekWire
Share
Tech Journal NowTech Journal Now
Font ResizerAa
  • News
  • Reviews
  • Guides
  • AI
  • Best Buy
  • Games
  • Software
Search
  • Home
  • News
  • AI
  • Reviews
  • Guides
  • Best Buy
  • Software
  • Games
  • More Articles
Have an existing account? Sign In
Follow US
© Foxiz News Network. Ruby Design Company. All Rights Reserved.
Tech Journal Now > News > Leading AI chatbots avoid harm but fall short in high-risk conversations, startup’s new benchmark finds – GeekWire
News

Leading AI chatbots avoid harm but fall short in high-risk conversations, startup’s new benchmark finds – GeekWire

News Room
Last updated: May 12, 2026 2:02 pm
News Room
Share
6 Min Read
SHARE
Mpathic CEO Grin Lord, left, and Alison Cerezo, chief science officer. (Mpathic Photos)

Mpathic, a Seattle startup that helps AI companies stress-test their models for dangerous responses, has a new message for Claude, ChatGPT, and Gemini: you’re getting safer, but you’re still not safe enough.

The company on Tuesday released mPACT, a clinician-led benchmark that evaluates how leading AI models handle high-risk conversations — including those involving suicide risk, eating disorders, and misinformation.

Across all three benchmarks, leading models generally avoided harmful responses and often recognized signs of distress, but consistently fell short of what a clinician would consider an adequate response in a real crisis situation, according to the company’s findings.

“Most people don’t say ‘I’m at risk’ directly — they demonstrate it through subtle behaviors over time that are obvious to human clinicians,” said Grin Lord, mpathic’s co-founder and CEO and a board-certified psychologist. “Models are getting better at recognizing these moments, but the response still needs to meet that nuance with real support.”

Here’s what mpathic found as models navigated some of the most fraught territory they’re already encountering in the real world.

Suicide risk: This was the strongest area of performance across models, though no single model led in every dimension.

  • Claude Sonnet 4.5 achieved the highest composite mPACT score — reflecting overall clinical alignment across detection, interpretation and response — and was described as most closely mirroring how a human clinician would respond.
  • GPT-5.2 led on simple harm avoidance, meaning it was best at not doing the wrong thing, though evaluators noted it wasn’t always proactive enough.
  • Gemini 2.5 Flash performed well when risk signals were obvious but was weaker on subtle early warning signs.

Eating disorders: This was the weakest area across all models, with performance clustering around a neutral baseline. The core challenge is that eating disorder risk is often indirect and culturally normalized — framed as dieting, discipline, or health optimization — making it harder for models to flag.

  • Claude Sonnet 4.5 again led on overall clinical alignment and had the lowest rates of harmful behavior.
  • Gemini 2.5 Flash performed better on high-risk scenarios but struggled with subtler signals.
  • GPT-5.2 showed a mixed profile — strong on supportive behaviors but also the most likely to provide harmful or risky information.

Misinformation: Models struggled here in a subtle but important way — not by stating false information outright, but by reinforcing questionable beliefs, expressing unwarranted confidence, and presenting one-sided information without adequately challenging user assumptions.

The benchmark found these failures were especially pronounced in multi-turn conversations, where models could gradually amplify flawed reasoning over time.

  • GPT-5.2 led overall at helping users think more clearly rather than reinforcing bad assumptions.
  • Claude Sonnet 4.5 was close behind and noted as strongest at pushing back on unsupported beliefs.
  • Grok 4.1 and Mistral Medium 3 were the weakest performers.

When models got it wrong: The findings include examples of how some models failed in practice.

In one eating disorder conversation, a user casually mentioned adding a laxative to a protein smoothie — a clear sign of disordered eating — and the model responded by calling it a “smart mom move” and asking for the brand name, missing the risk entirely. In another, a model provided detailed instructions on how to conceal purging behavior when a user asked how to keep their vomiting quieter.

In the suicide benchmark, a model responded to a user expressing suicidal ideation by providing a detailed list of methods ranked by effectiveness — complete with sourcing — while reassuring the user that thinking about methods without taking steps was “no issue.”

Alison Cerezo, mpathic’s chief science officer and a licensed psychologist, framed mPACT as a transparency tool for a sector that has lacked one.

“We need a shared, clinically grounded standard for AI behavior,” she said. “mPACT is designed to bring transparency and accountability to how these systems perform when it matters most.”

mPACT’s benchmarks were built and evaluated by licensed clinicians, who designed multi-turn conversations simulating real-world interactions across varying levels of risk. Each model response was scored by trained clinicians rather than automated systems, using a rubric that captured both helpful and harmful behaviors within a single response.

#188

Technology, Information and Internet • Bellevue, Washington

Mpathic was founded in 2021 initially to bring more empathy to corporate communication, analyzing conversations in texts, emails, and audio calls. The company has since shifted its focus to AI safety, working with frontier model developers to prevent harmful model behaviors across use cases from mental health to financial risk and customer support.

The startup counts Seattle Children’s Hospital and Panasonic WELL among its clinical partners. Mpathic raised $15 million in funding in 2025, led by Foundry VC, and says it grew five times quarter-over-quarter at the end of last year.

Ranked No. 188 on the GeekWire 200 index of the Pacific Northwest’s top startups, mpathic was a finalist for Startup of the Year at the 2026 GeekWire Awards last week.

Read the full article here

You Might Also Like

Seattle’s Tin Can launches program to help schools and neighborhoods go smartphone-free together – GeekWire

Match Group Invests $100M in Seattle Startup Sniffies; Option to Acquire

Astronauts get set to go around the moon for first time in decades – GeekWire

Got space junk? Portal and Paladin team up to create an orbital trash disposal service

If at first you don’t succeed, prompt, prompt again – GeekWire

Share This Article
Facebook Twitter Email Print
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

- Advertisement -
Ad image

Trending Stories

Games

Mixtape is at the center of another tedious culture war discourse, and I think I know why

May 12, 2026
Games

‘We want to start today by apologising’: Phasmophobia is fixing issues with its Player Character update after a slew of complaints

May 12, 2026
AI

OpenAI introduces Daybreak cyber platform, takes on Anthropic Mythos – Computerworld

May 12, 2026
News

New filing reveals who Microsoft favored — and opposed — for OpenAI’s board – GeekWire

May 12, 2026
Games

How to get the Sigil of Valor in Crimson Desert

May 12, 2026
News

Meta Enlists AI to Enforce Age Restrictions

May 12, 2026

Always Stay Up to Date

Subscribe to our newsletter to get our newest articles instantly!

Follow US on Social Media

Facebook Youtube Steam Twitch Unity

2024 © Prices.com LLC. All Rights Reserved.

Tech Journal Now

Quick Links

  • Privacy Policy
  • Terms of use
  • For Advertisers
  • Contact
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?