By Prerna Lal, Professor-Information Management at International Management Institute, New Delhi
There is not a single day without a piece of exciting news related to Artificial Intelligence. AI hallucination is one such topic related to Generative AI that has caught a lot of media attention in the last one and a half years simply because of its capability to generate text, which is sometimes hilarious and sometimes horrifying. Popular AI chatbots, be it ChatGPT from OpenAI Inc., Bard from Google, etc., are prone to hallucinations without any exception.
What is AI Hallucination?
To understand this, we must first know how Large Langue Models (LLM), the underlying AI algorithm of Generative AI, works. In simple words, for every prompt (e.g., a question or a statement), LLM examines billions and billions of texts to learn patterns and predict the word most likely to come next in each sequence of words.
IBM Corp. defines AI hallucination as a phenomenon wherein a large language model (LLM)—often a generative AI chatbot or computer vision tool—perceives patterns or objects that are non-existent or imperceptible to human observers, creating outputs that are nonsensical or altogether inaccurate.
Some notable cases regarding AI hallucinations
• Chatbot promise.
In 2022, a passenger of an Air Canada flight, Jake Moffatt, was promised a discount by a chatbot and was assured that he could book a full-fare flight for his grandmother’s funeral and then apply a bereavement fare after the flight. Interestingly, when Moffatt applied for the discount, he was informed by the airline that the chatbot had been wrong because, as per policy, he was required to submit the request for a discount before the flight, and thus, he would not get a discount. Moffatt filed a case with the British Columbia Civil Resolution Tribunal to get his money back. In Feb 2024, The British Columbia Civil Resolution Tribunal ruled in favour of Moffatt, saying that Air Canada is responsible for all the information on its website, regardless of whether it comes from a static page or chatbot. The airline had to pay Moffatt $812.02 (£642.64) in damages and tribunal fees.
• Legal cases that never happened.
In Feb 2022, Roberto Mata sued Colombian airline Avianca, alleging he was injured when a metal serving cart struck his knee during a flight to Kennedy International Airport in New York. Avianca asked a Manhattan judge to toss out the case. Mata’s lawyer, Steven Schwartz, and Peter LoDuca from the law firm Levidow, Levidow & Oberman objected to it by submitting a legal brief citing six court cases with decisions to show the precedence. One of the lawyers, Steven Schwartz, used Chat GPT to prepare the brief. But there was a catch: he had no idea about the propensity of AI to invent facts. Neither Avianca nor the judge was able to find a single case cited in the brief as AI fabricated them. In Jan 2023, the court fined the two lawyers and the firm a total of $5000 as deterrence, with a caution by the judge that AI-generated cases must be checked for accuracy.
Type of AI hallucinations and their impact on organisations
AI Hallucination can take any form, ranging from generating text with factual errors, fabricated information, harmful misinformation, and weird responses. Organisations should be incredibly concerned about hallucinations. It may not have a significant impact on an individual if they receive a few incorrect responses to their inquiries while using generative AI tools. They may laugh about it or share it on social media for fun. However, it is different for organisations, as a lot is at stake.
How can we forget about the much-hyped announcement of Bard, Google’s AI-powered chatbot, wherein incorrect claims were made about the James Webb telescope in its first promotional video. This had serious financial implications for Google’s parent company, Alphabet, which lost $100 billion in market value on the same day.
Consider a scenario wherein the LLM model used by the organisation starts providing fabricated information, and employees use this to make business decisions. Matta vs. Avianca is an example of how fabricated information may harm the reputation of the organisation and raise ethical concerns. In addition to that, it also results in a loss of time and money to rework the process.
Text with misleading information generated by LLM models can also pose a challenge for organisations; for example, a chatbot promising something that is against organisational policy or telling customers about a feature of a product or service that is not present. This will impact the organisation’s credibility, leading to a customer trust deficit. The repercussions of a trust deficit are losing customer loyalty and difficulty in retaining customers.
Sometimes, AI Hallucinations may leave organisations embarrassed with weird and creepy responses. For example, in 2018, WestJet Chatbot sent a link to a suicide prevention hotline to a happy customer for no obvious reason .
In sum, AI hallucinations can lead to reputational damage or financial, operational, and legal risks for an organisation.
How do we mitigate AI hallucinations?
AI hallucinations occur due to several underlying issues within the AI’s learning process and architecture. LLMs are trained extensively on large datasets to learn patterns and generate coherent content. Thus, training data must be free of biases, should not contain misinformation, and should be free of contradictory statements. Further, organisations should be careful when using generic LLMs like Chat GPT; instead, they need to ensure that LLMs are trained using a knowledge base that is relevant to their domain. AI research companies have been trying to find ways to deal with these challenges, and Retrieval Augmented Generation (RAG) techniques seem promising.
RAG is a technology that combats AI hallucinations by providing factual grounding. This involves a comprehensive search of an organisation’s private data sources, which are curated and reviewed by domain experts, to identify relevant information that can enhance LLM’s public knowledge. Consequently, the LLM can anchor its responses in actual data, thus significantly reducing the risk of generating spurious outputs that are unsupported by empirical evidence. By doing so, RAG enhances the reliability and accuracy of the LLM’s output, which is paramount in business where data-driven decisions are critical.
Ensure verification before application
It is crucial to remember that hallucination is a limitation of LLMs and cannot be cured 100 per cent. However, it can be reduced by ensuring rigorous testing, verification, and validation of every stage of the LLM process through the Human-in-the-Loop approach. By incorporating these measures, we can increase the reliability and accuracy of outputs generated by LLMs. Adopting a mindful and prudent approach towards implementing LLM by organisations can aid in making informed decisions.