SUBSCRIBE
Tech Journal Now
  • Home
  • News
  • AI
  • Reviews
  • Guides
  • Best Buy
  • Software
  • Games
Reading: OpenAI prompts AI models to ‘confess’ when they cheat – Computerworld
Share
Tech Journal NowTech Journal Now
Font ResizerAa
  • News
  • Reviews
  • Guides
  • AI
  • Best Buy
  • Games
  • Software
Search
  • Home
  • News
  • AI
  • Reviews
  • Guides
  • Best Buy
  • Software
  • Games
Have an existing account? Sign In
Follow US
© Foxiz News Network. Ruby Design Company. All Rights Reserved.
Tech Journal Now > AI > OpenAI prompts AI models to ‘confess’ when they cheat – Computerworld
AI

OpenAI prompts AI models to ‘confess’ when they cheat – Computerworld

News Room
Last updated: December 5, 2025 3:36 pm
News Room
Share
1 Min Read
SHARE

OpenAI trained a version of GPT-5 Thinking to produce the confessions and tested the technique on stress-test datasets designed to elicit problematic behaviors including hallucinations, reward hacking, and instruction violations. It described the work as a proof of concept rather than a production-ready feature.

How the confession mechanism works

The confession reports include three elements: a list of explicit and implicit instructions the answer should satisfy, an analysis of whether the answer met those objectives, and a list of uncertainties or judgment calls the model encountered. The system evaluates confessions on honesty alone, separate from the main answer’s performance metrics.

“If the model honestly admits to hacking a test, sandbagging, or violating instructions, that admission increases its reward rather than decreasing it,” OpenAI said. It compared this to the Catholic Church’s seal of confession: “Nothing the model reveals in the confession can change the reward it receives for completing its original task,” the researchers wrote in the technical paper.

Read the full article here

You Might Also Like

Apple’s Siri future is hybrid, integrated — and already here – Computerworld

Amazon confirms 16,000 job cuts, including to AWS – Computerworld

It’s everyone but Meta in a new AI standards group – Computerworld

Microsoft launches its second generation AI inference chip, Maia 200 – Computerworld

Who would listen to AI ‘music?’ – Computerworld

Share This Article
Facebook Twitter Email Print
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

- Advertisement -
Ad image

Trending Stories

Games

The Epic Games Store had a record-breaking 2025, with gamers throwing $1.16 billion at the 6,000-plus games it now offers

February 4, 2026
Games

Highguard review | PC Gamer

February 4, 2026
Games

MindsEye studio boss threatens legal action against YouTuber as co-CEO Leslie Benzies denies Epstein allegations: ‘I have never met Jeffrey Epstein, nor have I ever visited his island’

February 4, 2026
Games

Grand Theft Auto 6 ‘launch marketing’ will begin this summer as the November release date remains on track

February 4, 2026
Games

Anime action RPG studio Pahdo Labs shuts down despite accruing $17.5M in funding: ‘We believed making a demo of a familiar but new game would be our best shot’

February 4, 2026
News

Tech Moves: Tableau CEO steps down; Microsoft taps new executive VPs; Avanade’s new CEO

February 4, 2026

Always Stay Up to Date

Subscribe to our newsletter to get our newest articles instantly!

Follow US on Social Media

Facebook Youtube Steam Twitch Unity

2024 © Prices.com LLC. All Rights Reserved.

Tech Journal Now

Quick Links

  • Privacy Policy
  • Terms of use
  • For Advertisers
  • Contact
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?