Air Canada had to refund a customer after its customer service AI hallucinated
In February 2024, a Canadian court forced Air Canada to partially refund the airline ticket purchased by one of its customers. The customer had asked the customer support AI bot under which conditions Air Canada reimburses emergency-purchased airline tickets when a loved one dies. The bot responded by hallucinating a refund policy that did not exist.
Air Canada offered the customer a voucher for 200 Canadian dollars, but the customer refused and took the case to court. Air Canada fought the complaint, arguing that the customer should have referred to the pages explaining the refund policy rather than the bot's words. Clearly, the judge was not receptive to the argument, and they considered that the bot's words have the same legal value as pages written by humans.
Following this ruling, Air Canada seems to have stopped its bot, despite its significant set up cost. The bot's goals were to reduce customer service costs and improve service quality.
From my point of view, what is interesting in this anecdote is the unreasonable trust Air Canada placed in its bot. It has been documented, including in the scientific literature, that AIs based on large language models such as ChatGPT are prone to hallucinations. They tend to invent information that does not exist. However, Air Canada seems to have acted as if this documented limitation did not exist, or was minor enough not to pose a problem.
On Threads, Gergely Orosz mentions a similar anecdote in these two posts:
I enjoy hearing companies use GenAI / LLMs as experiments (that can fail!) to improve developer productivity.
Lately, I'm hearing more stories of even large companies where leadership is treating it as a (desperate) solution that must succeed in increasing productivity.
Like there's ~$10B company, losing money big time, where they are pushing devs to dump what they know into the wiki; and hope their internal LLM can scoop it up and e.g. launch new features in new regions, autonomously, and without the need to have a dev involved.
Ugh.
Anyone who has ever asked a generative AI to generate computer code knows that the code generated has to be meticulously reviewed. Often, the code looks like running code, but it does not run. Or the code runs, but does not do what it is supposed to do.
As with Air Canada and its customer service bot, it is wishful thinking to believe that a technology so prone to hallucination can be used to develop new features without human intervention. In the future, generative AIs may be able to write reliable-enough code, or not hallucinate refund policies that do not exist. But in its current form, the technology is not capable of that.
Generative AIs deserve better than moral panics. But neither do they deserve to be treated as miracle solutions, ignoring their well-documented limitations. The risk is to make costly mistakes that are easy to avoid. I suspect that Air Canada will not be the only organization to make this kind of mistake.