It isn’t uncommon for many people to make up an answer that they feel seems plausible if asked something they don’t have an answer to. It happens all the time.
As it turns out, this is a tendency that artificial intelligence has inherited from us.
A Recent Study Strongly Suggests AI is a Terrible Search Engine Alternative
The Tow Center for Digital Journalism—part of Columbia Journalism Review—evaluated eight AI-based live search tools by feeding real excerpts from actual, searchable news articles into these models and asking them to cite the source.
The Tow Center selected 20 publishers and pulled 10 articles from each, ensuring that a traditional Google search would provide the correct source within the top three results. Eight AI models…
- OpenAI’s ChatGPT Search
- Perplexity
- Perplexity Pro
- DeepSeek-V3 Search
- Microsoft’s Copilot
- xAI’s Grok-2
- xAI’s Grok-3 beta
- Google’s Gemini
…were tasked with finding the appropriate article’s headline, publisher, publication date, and web address.
Once the responses were collected, they were labeled as Correct, Correct but Incomplete, Partially Incorrect, Completely Incorrect, Not Provided, or Crawler Blocked, based on whether the chatbots correctly identified the article, source, and link. Crawler Blocked was reserved for websites where the publisher had blocked the chatbot’s web crawlers.
After all 1600 queries were manually categorized, the results provided chilling insights.
The Tow Center’s Research Revealed a Lot
First and foremost, the chatbots were incorrect more than half the time… but confidently so. Overall, over 60% of the queries brought back incorrect answers, with some chatbots being far more egregiously incorrect. Perplexity, for instance, was wrong 37 percent of the time, while Grok missed the mark in 94 percent of its responses. There was also little indication that these incorrect answers could potentially be erroneous. Instead of addressing failure or using qualifiers like “it’s possible” or “might,” most chatbots would simply make up an answer.
Copilot was the only outlier, as it declined to answer whenever it couldn’t accurately (which was most of the time). Premium, subscription-based models Perplexity Pro and Grok-3 Search were also more confidently incorrect, more often, than their free options were.
It gets worse, as some of the publishers included in this study had blocked AI crawlers from accessing content. Interestingly, the AI models were most successful in answering questions about content they weren’t supposed to be able to access, while incorrectly answering or simply not answering queries about content they could freely access.
Misattribution was another common problem, with these search tools commonly citing incorrect articles or failing to link back to the correct source. The researchers noted that even partnerships between publishers and these platforms weren’t enough to prevent this misattribution, with many citations referring to republished versions. This also complicates things for publishers who opt out of being crawled, as the chatbots still cite their content but refer to its republished version on another website. Many responses referenced URLs that were broken or simply made up. While not the only perpetrators, Grok 3 and Gemini were reportedly the worst offenders.
Things didn’t necessarily get better when there were deals with publishers, either, with accuracy varying wildly within these cases.
This Information Shows Us One Thing: AI Search is Unreliable
While the researchers acknowledged that they don’t have all the information, like whether or not these publishers have additional blocks in place to prevent chatbot crawlers and the fact that each chatbot could respond differently to the same prompt if given the chance, one fact still surprised them.
As they put it:
“Though we did not expect the chatbots to be able to correctly answer all of the prompts, especially given crawler restrictions, we did expect them to decline to answer or exhibit uncertainty when the correct answer couldn’t be determined, rather than provide incorrect or fabricated responses.”
This makes their ultimate conclusion—that these chatbots have troubling issues that could present harm to publishers and consumers alike—important to acknowledge as these tools are put to use.
While AI can prove to be an extremely useful tool in the right contexts, it is critical that we use it mindfully.
Comments