SUBSCRIBE
Tech Journal Now
  • Home
  • News
  • AI
  • Reviews
  • Guides
  • Best Buy
  • Software
  • Games
Reading: The biggest AI mistake: Pretending guardrails will ever protect you
Share
Tech Journal NowTech Journal Now
Font ResizerAa
  • News
  • Reviews
  • Guides
  • AI
  • Best Buy
  • Games
  • Software
Search
  • Home
  • News
  • AI
  • Reviews
  • Guides
  • Best Buy
  • Software
  • Games
Have an existing account? Sign In
Follow US
© Foxiz News Network. Ruby Design Company. All Rights Reserved.
Tech Journal Now > AI > The biggest AI mistake: Pretending guardrails will ever protect you
AI

The biggest AI mistake: Pretending guardrails will ever protect you

News Room
Last updated: December 15, 2025 7:22 am
News Room
Share
6 Min Read
SHARE

The fact that the guardrails from all the major AI players can be easily bypassed is hardly news. The mostly unaddressed problem is what enterprise IT leaders need to do about it.

Once IT decision-makers accept that guardrails don’t consistently protect anything, the presumptions they make about AI projects are mostly rendered moot. Other techniques to protect data must be implemented.

The reports of guardrail bypasses are becoming legion: Poetry disables protections, as does leveraging chat history, inserting invisible characters and using hexadecimal format and emojis. Beyond those, patience and playing the long game, among others, can wreak havoc — impacting just about every generative (genAI) and agentic model. 

The risks are hardly limited to what attackers can accomplish. The models themselves have shown a willingness to disregard their own protections when the models see them as an impediment to accomplishing an objective, as Anthropic has confirmed. 

If we try to extend the road analogy that gives a guardrail its name, “guardrails” are not guardrails in the physical concrete barrier sense. They are not even strong deterrents, in the speedbump sense. They are more akin to a single broken yellow line. It is a weak suggestion with no enforcement or even serious discouragement. 

If I may borrow a line from popular social media video blogger Ryan George in his writer-vs.-producer movie pitches series, an attacker wanting to get around today’s guardrails will find it “super easy, barely an inconvenience.” It’s as if homeowners protect their homes by placing “Do Not Enter” signs on all their doors, then keep the windows open and the doors unlocked. 

So, what should an AI project look like once we accept that guardrails won’t force a model or agent to do what it’s told?

IT has a few options. First, wall off either the model/agent or the data you want to protect. 

“Stop granting AI systems permissions you wouldn’t grant humans without oversight,” said Yvette Schmitter, CEO of the Fusion Collective consulting firm. “Implement the same audit points, approval workflows, and accountability structures for algorithmic decisions that you require for human decisions. Knowing guardrails can’t be relied on means designing systems where failure is visible. You wouldn’t let a hallucinating employee make 10,000 consequential decisions per hour with no supervision. Stop letting your AI systems do exactly that.”

Gary Longsine, CEO at IllumineX, agreed. He argued that the same defenses enterprises use to block employees from unauthorized data access need to now be deployed to genAI and AI agents. “The only real thing that you can do is secure everything that exists outside of the LLM,” Longsine said. 

Taken to its extreme, that might mean keeping a genAI model in an isolated environment, feeding it only the data you want it to access. It’s not exactly air-gapped servers, but it’s close. The model can’t be tricked into revealing data it can’t access.

Capital One toyed with something similar; it created genAI systems for auto dealerships, but also gave the large language model (LLM) it used access to public data. The company also pushed open-source models and avoided hyperscalers, which addressed another guardrail issue. When agents are actively managed by a third-party firm in a cloud environment, your rules don’t necessarily have to be obeyed. Taking back control might mean literally doing that.

Longsine said some companies could cooperate to build their own data center, but that effort would be ambitious and costly. (Longsine put the price tag at $2 billion, but it could easily cost far more — and it might not even meaningfully address the problem.)

Let’s say five enterprises built a data center that only those five could access. Who would set the rules? And how much would any one of those companies trust the other four, especially when management changes? The companies might wind up replacing a hyperscaler with a much smaller makeshift hyperscaler, and still have the same control problems.

Here’s the painful part: There are many genAI proofs of concept out there today that simply won’t work if management stops believing in guardrails. At the board level, it seems, the Tinkerbell strategy remains alive and well. They seem to think guardrails will work if only all investors just clap their hands really loudly. 

Consider an AI deployment allowing employees to access HR information. It will only tell any employee or manager the information they should be able to access. But those apps — and countless others just like them — take the easy coding approach; they grant the model access to all HR data and rely on guardrails to enforce proper access. That won’t work with AI.

I’m not saying guardrails will never work. On the contrary, my observations suggest they do — about 70% to 80% percent of the time. In some better designed rollouts, that figure might hit 90%. 

But that’s the ceiling. And when it comes to protecting data access — especially potential exfiltration to anyone who asks the right prompt — 90% won’t suffice. And IT leaders who sign off on projects hoping that will do are in for a very uncomfortable 2026. 

Read the full article here

You Might Also Like

Google’s Quantum chip claims 13,000x speed advantage over supercomputers – Computerworld

Newly discovered malicious extensions could be lurking in enterprise browsers – Computerworld

The newest Windows Copilot agent can send emails, update documents on its own – Computerworld

Amazon to replace 600,000 US workers by 2033 with robots – Computerworld

Google releases Gemini 3 with new reasoning and automation features – Computerworld

Share This Article
Facebook Twitter Email Print
Leave a comment Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

- Advertisement -
Ad image

Trending Stories

Games

The best characters in Fallout

December 15, 2025
News

Nemotron 3: Nvidia’s Open-Weight Engine for the Next AI Wave

December 15, 2025
News

Startups team up to demonstrate satellite rendezvous using Starfish Space’s navigation system

December 15, 2025
Games

Arc Raiders’ Cold Snap event will turn ‘every raid into a high-stakes survival run’ as frostbite drives players into what little shelter remains

December 15, 2025
News

Rob Enderle’s 2025 Tech Product of the Year

December 15, 2025
AI

How businesses can bolster their defenses – Computerworld

December 15, 2025

Always Stay Up to Date

Subscribe to our newsletter to get our newest articles instantly!

Follow US on Social Media

Facebook Youtube Steam Twitch Unity

2024 © Prices.com LLC. All Rights Reserved.

Tech Journal Now

Quick Links

  • Privacy Policy
  • Terms of use
  • For Advertisers
  • Contact
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?