ChatGPT gave wildly inaccurate translations — to try and make users happy

Enterprise IT leaders are becoming uncomfortably aware that generative AI (genAI) technology is still a work in progress and buying into it is like spending several billion dollars to participate in an alpha test— not even a beta test, but an early alpha, where coders can barely keep up with bug reports.

For people who remember the first three seasons of Saturday Night Live, genAI is the ultimate Not-Ready-for-Primetime algorithm.

One of the latest pieces of evidence for this comes from OpenAI, which had to sheepishly pull back a recent version of ChatGPT (GPT-4o) when it — among other things — delivered wildly inaccurate translations.

Lost in translation

Why? In the words of a CTO who discovered the issue, “ChatGPT didn’t actually translate the document. It guessed what I wanted to hear, blending it with past conversations to make it feel legitimate. It didn’t just predict words. It predicted my expectations. That’s absolutely terrifying, as I truly believed it.”

OpenAI said ChatGPT was just being too nice.

“We have rolled back last week’s GPT‑4o update in ChatGPT so people are now using an earlier version with more balanced behavior. The update we removed was overly flattering or agreeable — often described as sycophantic,” OpenAI explained, adding that in that “GPT‑4o update, we made adjustments aimed at improving the model’s default personality to make it feel more intuitive and effective across a variety of tasks. We focused too much on short-term feedback and did not fully account for how users’ interactions with ChatGPT evolve over time. As a result, GPT‑4o skewed towards responses that were overly supportive but disingenuous.

“…Each of these desirable qualities, like attempting to be useful or supportive, can have unintended side effects. And with 500 million people using ChatGPT each week, across every culture and context, a single default can’t capture every preference.”

OpenAI was being deliberately obtuse. The problem was not that the app was being too polite and well-mannered. This wasn’t an issue of it emulating Miss Manners.

I am not being nice if you ask me to translate a document and I tell you what I think you want to hear. This is akin to Excel taking your financial figures and making the net income much larger because it thinks that will make you happy.

In the same way that IT decision-makers expect Excel to calculate numbers accurately regardless of how it may impact our mood, it expects that the translation of a Chinese document doesn’t make stuff up.

OpenAI can’t paper over this mess by saying that “desirable qualities like attempting to be useful or supportive can have unintended side effects.” Let’s be clear: giving people wrong answers will have the precisely expected effect — bad decisions

Yale: LLMs need data labeled as wrong

Alas, OpenAI’s happiness efforts weren’t the only bizarre genAI news of late. Researchers at Yale University explored a fascinating theory: If an LLM is only trained on information that is labeled as being correct — whether or not the data is actually correct is not material — it has no chance of identifying flawed or highly unreliable data because it doesn’t know what it looks like.

In short, if it’s never been trained on data labeled as false, how could it possibly recognize it? (The full study from Yale is here.)

Even the US government is finding genAI claims going too far. And when the feds say a lie is going too far, that is quite a statement.

FTC: GenAI vendor makes false, misleading claims

The US Federal Trade Commission (FTC) found that one large language model (LLM) vendor, Workado, was deceiving people with flawed claims of the accuracy of its LLM detection product. It wants that vendor to “maintain competent and reliable evidence showing those products are as accurate as claimed.”

Customers “trusted Workado’s AI Content Detector to help them decipher whether AI was behind a piece of writing, but the product did no better than a coin toss,” said Chris Mufarrige, director of the FTC’s Bureau of Consumer Protection. “Misleading claims about AI undermine competition by making it harder for legitimate providers of AI-related products to reach consumers.

“…The order settles allegations that Workado promoted its AI Content Detector as ‘98 percent’ accurate in detecting whether text was written by AI or human. But independent testing showed the accuracy rate on general-purpose content was just 53 percent,” according to the FTC’s administrative complaint.

“The FTC alleges that Workado violated the FTC Act because the ‘98 percent’ claim was false, misleading, or non-substantiated.”

There is a critical lesson here for enterprise IT. GenAI vendors are making major claims for their products without meaningful documentation. You think GenAI makes stuff up? Imagine what comes out of their vendor’s marketing department.

Read the full article here

ChatGPT gave wildly inaccurate translations — to try and make users happy

Lost in translation

Yale: LLMs need data labeled as wrong

FTC: GenAI vendor makes false, misleading claims

Leave a Reply Cancel reply

Trending Stories

2023’s true GOTY has had its name and assets jacked by ‘some kind of crypto scam,’ while bootlickers assure the dev it’s actually great publicity

The one secret to using genAI to boost your brain – Computerworld

How to get Corite in Mecha Break

How to get Matrix Credits in Mecha Break

How to get Mission Tokens in Mecha Break

Human spirit triumphs as the Croc devs announce they want to remaster Buck Bumble and the greatest main menu theme song of all time

Follow US on Social Media

Quick Links

Lost in translation

Yale: LLMs need data labeled as wrong

FTC: GenAI vendor makes false, misleading claims

You Might Also Like

Leave a Reply Cancel reply

Trending Stories

Always Stay Up to Date

Follow US on Social Media

Quick Links