One problem that the firm has encountered is when sites want to allow search engine crawlers but block genAI crawlers, Tatoris said. That is easily accomplished in most cases, but “the Google bot is a tricky one, a challenging one right now” because it’s difficult if not impossible to distinguish between the Google search engine crawler and the Google genAI crawler, Tatoris said.
Consultant Schubert said he doesn’t have a good answer on how to protect web assets from AI crawlers. “A lot of people do the ‘let’s use an LLM to generate trash content to feed trash to the training robots’ [tactic], and while I guess that works, I’m not a huge fan,” he said. “That’s effectively wasting energy to allow someone else to waste energy. Ideally, we’d have clear legislation and judge decisions telling those companies that what they do is not fair use.”
Little help from the law
In a vacuum, this situation would be ideal for a class-action lawsuit because there are lots of victims and the damages are relatively easy to quantify. The web host firm could list typical bandwidth costs for a site before the genAI crawler visits and afterwards.
Read the full article here