“What you’re going to find on the input and output tokens is a dramatic difference in price,” Suda said. “The build cost is quite low to get into the game, but once you begin using it, the costs go up. Say you have 400 users to start. By year four, you may have 2,000 users.”
“So, what happens? You’re consuming more and your costs goes up four times by year four,” he said.
“GenAI is not like Google, but some organizations use it like Google — you go into it and ask a question and get an answer. That doesn’t really happen,” Suda continued. “You get an answer and often think, ‘That’s not quite what I wanted.’ And so that makes them want to ask another question. That can multiply your cost quickly.”
The hidden costs that can add up
Thermo Fisher Scientific’s Kwiecien said one cost that hasn’t been considered involves testing. “Every time you ask a question and test it, that’s a cost,” she said. “I’m not just going to load that 500 time,s because that will cost me every time.
“We need to test how often AI gives good answers to common questions like ‘What’s the recruiting process?’ or ‘Where’s my 401(k) info?’” she said. “But each test costs money, so we have to balance accuracy with cost and decide how many times to test to be confident in the results.”
Thermo Fisher is currently using a virtual chatbot from ServiceNow, and hopes to make it more intuitive by adding a genAI layer. As a result, it’s currently eyeing genAI solutions from Microsoft, IBM and others.
Another cost can come with efforts to use genAI in hiring. Amy Ritter, vice president for Talent Acquisition at Thermo Fisher, said the company implemented a genAI-powered hiring app from Phenom to automate parts of its global manufacturing hiring platform. The company then had to invest in job preview videos to show candidates what it’s like to work at Thermo Fisher — covering the environment, required PPE, and key skills — since recruiters weren’t involved early in the process.
The cost of change management is also often overlooked, Ritter said. “We invested time and money visiting sites, engaging leaders, and building buy-in, which paid off with strong adoption at launch,” she said.
Injecting Phenom’s genAI into its HR hiring platform, however, netted big returns, Ritter said. It cut candidate screening time from 16 days to a just 7 minutes. Along with automating interview scheduling, cumulatively Thermo Fisher is saving over 8,000 hours a year in candidate screening, 12,000 hours in scheduling time and filling roles 10% faster, Ritter said.
And there are infrastructure costs — the cost of building out, running and maintaining server farms, including managed service, is also often underestimated, according to AWS’s Hennesey. “One insurance customer had 200 [proofs of concept] running, but couldn’t articulate the expected value — most were just experiments. Our advice: clearly define the problem, align it with organizational goals, and measure expected returns,” he said.
Moving from pilot to production can also be a soft spot for costs, as can shifting from on-prem to the cloud; the latter means new services and pricing models that need to be understood and forecast.
AWS’s Bedrock, Microsoft’s Azure AI Studio, Google’s Cloud Vertex AI, IBM’s Watson.ai and Cohere’s Platform are all fully managed service offerings that allow AI developers to build apps using top foundation models via a single API — no infrastructure management needed. “You pay on a per model, on a per region basis,” Hennesey said. “And, then you have to think about tokens.”
Making a “capacity commitment” to a vendor can cuts costs. So instread of buying capacity “on demand” organizations can make an LLM capacity commitment for a specific amount of time – whether one month or six months – and deliver up to a 60% savings, Hennesey said.
The bottom line: there’s still a lot of uncertainty around the cost of genAI projects because the technology is still in its early days — and still evolving.
“I feel like we’re not getting great answers, because people are unsure how it’s going to be used,” Kwiecien said. “And so it’s hard to understand what your usage may look like in the future, because we can’t tell how fast it’s going to take people to flip to that.
“How fast are we going to get the solutions to really answer the way that we want it to answer?” she said.
Read the full article here