We turn to Artificial Intelligence (AI) chatbots for research, guidance, and emotional support. But how do we deal with the sweet lies they tell us, sometimes as much as 30%? A recent study conducted by Relum, an online gaming support engine, found that popular AI chatbots hallucinate up to 30% of the time when asked for information. ChatGPT, the most popular product among users like us, creates stuff about 35% of the time, while Gemini leads the way with hallucinations 38% of the time. Although other studies vary in percentage (ranging between 17-35%), one thing is clear: AI remains one in five answers given by chatbots.
In October, the Australian government caused an uproar when it discovered that a report produced by global consulting firm Deloitte for one of its departments had cited non-existent experts, scientific papers, even studies conducted by the University of Sydney that did not exist. After this became an issue, Deloitte confirmed that it used Microsoft’s Azure OpenAI GPT4o system to help format parts of the report. As the company returned the $290,000 to the Australian government, a report to the Canadian Department of Health found false quotes, fabricated academic papers that the same consultancy firm had developed for them. Again, thanks to the research that used AI chatbots.
The global market size for AI technology, infrastructure, software services and business was estimated at $371.71 billion in 2025. According to research from Markets & Markets, it is set to grow to $2.407 trillion in 2032 with a massive growth of 30.6% per year. As AI is deployed in internal company systems, healthcare, finance, cybersecurity, and the defense of countries, hallucination has become a challenge for everyone from tech companies to governments. As the Deloitte example shows, models create data rather than simply saying “I don’t know.”
A slightly nervous Sam Altman announced Code Red in a memo to his staff in December, seeking to fix, among other things, the reliability of ChatGPT. According to Altman, the aim is to improve the personalization, usability, and reliability of ChatGPT. This is in complete contrast to their continued quest for more computing, more infrastructure and more scale to reach AGI (Artificial General Intelligence). Its purpose, Altman said, was to deal with hallucinations.
Why do AI models lie?
Hallucinations, as the AI industry calls the plausible but false statements generated by big language models like ChatGPT and Gemini, are something that has become ingrained in the way these older models are built. When we told several AI models why they lie, the first thing they wanted to do was distinguish lies from hallucinations. ChatGPT replied, “I don’t lie – white or otherwise. I may be wrong, out of date, or limited, but I don’t intentionally deceive,” acknowledging that it can be wrong if it has outdated information, incorrect sources, or gaps in its training. Responses to US-made Gemini, Cloud and Perplexity AI were similar.
But then, if the models contain incomplete or outdated information, why not say so? Why create information, data and facts? When we prompted the models to again ask, “Why hallucinate when you don’t have an answer?” The backlash clearly put it on the manufacturers of the model. Gemini explained, “I don’t ‘retrieve’ facts like a database; I predict text based on patterns learned during training.” “When you ask a question, my primary objective is to complete the conversation with a fluent, relevant response.” When fresh data is missing, the model relies on older data, generalizations from similar subjects, and linguistic plausibility, leading to “plausible but inaccurate statements”.
“I am designed to always have an answer,” Chatgpt agreed, expressing the need for models to be helpful even when facts and data are lacking – to generate the most statistically probable answer. ChatGPT puts this directly on its creators. Thanks to models being pre-trained a certain way, most of these AI models have no understanding of what they don’t know. On simple tasks like summarizing text, coding or trivia, the hallucination rate is very low – around 1-3%, but complicate the task – specific content, specific URLs, obscure news sources or even some specific validation, this hallucination rate is as high as 60-70%.
“If you ask for a specific URL from a news article from 2022, I might confuse a link that looks right (right domain, right date format, right keywords) but leads to a 404 page,” Gemini says. Gemini explains that this is because it is predicting what the URL should look like, not retrieving it from the database.
These hallucinations can appear in surprising ways, even to seemingly straightforward questions, writes Adam Taubman Kalai, an AI researcher at Open AI who released a scientific paper on the hallucinations in September. “For example, when we asked a widely used chatbot for the title of Adam Taubman Kalai’s PhD dissertation, it confidently gave three different answers – none of them correct. When we asked his birthday, it gave three different dates, all similarly wrong.” According to Kalai, hallucinations are caused by false stimuli. Model performance is measured by accuracy in predicting uncertainty rather than accuracy. This makes the AI model reliable if there is one correct answer and unreliable if there is any ambiguity. There are no right/wrong labels, so the models don’t understand wrong and right. OpenAI has an error rate of 26% in its evaluation of ChatGPT 5. Kalai writes, “Penalize confidence errors more than punish uncertainty and it will reduce.”
Rewarding uncertainty over a guess
From OpenAI to Google, from X to Meta, companies are constantly tackling the challenge of hallucinations to make their models more reliable. But this is easier said than done. A new study found that models and AI agents based on them, even after training, still overestimate their knowledge or refuse to answer too much if given too many parameters (which means ignoring some signals). Calibration of AI models is still an art. This is possible if the models are given what researchers are calling the ‘IDK dataset’ (I-don’t-know dataset). These datasets include specific datasets that teach models to say IDK on certain cues and following instructions, supervision, and feedback from humans (called reinforcement learning in the industry).
Google DeepMind created a set of rules, called Sparrow, that re-learned the model and applied human feedback to find and cite factual information. Anthropic’s Cloud AI has a constitution that its model follows at all times. The constitution of Cloud AI places clear boundaries, clear values, and a set of principles and processes when training models. This is reinforced using human feedback. To create the constitution, Anthropic’s researchers explained, they attempted to capture DeepMind’s Sparrow Code, the United Nations Declaration of Human Rights, trust and security practices, non-Western perspectives, and Apple’s terms of services. Claude AI, due to his codified constitution, hallucinates less than his counterparts (at 17% according to the Realm study). Perhaps it is this reliability that has made Anthropic’s Cloud AI the preferred partner for enterprises. According to data from Menlo Ventures, the company has more than 300,000 enterprise customers with 32% of the enterprise AI market, ahead of OpenAI and Google (both 20%).
“When I don’t know something, I tell you I don’t know,” Claude Sonnet 4.5 replies when asked if it is a hallucination, adding that its knowledge is up to the end of January (for the model we ask about) and if it has not been trained on something, or is not confident enough in its knowledge, it will answer so. “I try to avoid the trap of appearing confident when I’m really unsure or vaguely certain about something,” it says, adding that the goal is to give you accurate information about what he knows and to acknowledge the shortcomings in his knowledge.
There have been several studies recently showing how Gen Z and Gen A trust AI more than humans for everything from mental health advice to career decisions. But at its core, AI models are business entities, built to keep you hooked – just like social media. Will future generations know how to filter out chatbots’ sweet white lies, or will they hallucinate along with the models? After all, hallucinations – experiencing something that doesn’t exist – are a human thing to do. Like telling a white lie.
(Shweta Taneja, a writer and columnist, tracks the evolving relationship between science, technology and modern society)
https://www.marketsandmarkets.com/Market-Reports/artificial-intelligence-market-74851580.html
https://www.anthropic.com/news/clouds-constitution
https://menlovc.com/perspective/2025-mid-year-llm-market-update/







