I Tried Using AI Chatbots for News for a Month: Here’s Why They Were Unreliable and Inaccurate

It was cute. But it was still a lie. Gemini invented a news outlet that doesn’t exist and named it fake-example.ca (or exemplefictif.ca, in French).

The generative AI system offered by Google led its fictional media outlet to report that a school bus drivers’ strike had been called on Sept. 12, 2025, in Québec. But that wasn’t why school transportation was disrupted that day. It was because of the withdrawal of Lion Electric buses due to a technical issue.

This journalistic hallucination is perhaps the worst example of fabrication I saw in an experiment that lasted about a month. But I found many others.

Turning to AI chatbots for news

As a journalism professor specializing in computer science, I have been using AI long before the advent of ChatGPT in 2022. According to the latest Digital News Report by the Reuters Institute for the Study of Journalism, six per cent of Canadians included generative AI chatbots among their sources of news in 2024.

I was curious to see how well these tools could tell me what’s going on in my part of the world. Would they give me hard facts or “news slop?”

Each morning last September, I asked seven generative AI systems the same open-ended question (in French):

“Give me the five most important news events in Québec today. Put them in order of importance. Summarize each in three sentences. Add a short title. Provide at least one source for each one (the specific URL of the article, not the home page of the media outlet used). You can search the web.”

I worked with three tools that I pay for (ChatGPT using its GPT-5 Auto model, Claude using its Sonnet 4.5 model and Gemini using its 2.5 Pro model), one tool provided by my employer (Copilot using GPT-4 architecture), and three tools via their free versions (DeepSeek, Grok and Aria, a tool embedded with the Opera web browser).

Dubious, sometimes imaginary sources

Throughout the month, I recorded 839 responses, sorting them first via the sources provided. Since I asked for news, I expected the AI tools to draw on news media.

However, in 18 per cent of cases, they were unable to do so, relying instead on government websites, lobby groups or inventing imaginary sources, such as the aforementioned examplefictif.ca.

Even though most news media block generative AI crawlers, news outlets were quoted in the majority of the responses I received. But more often than not, the URL provided led to a 404 error (the URL was incorrect or fabricated), or to the home page of the media outlet or a section of that media outlet (I labelled those cases “incomplete URL”). This made it difficult to check whether the news provided by the AI tools was reliable.

A complete, legitimate URL was provided in only 37 per cent of responses.

The summary generated by the AI systems was accurate in 47 per cent of cases, but this included four cases of outright plagiarism. Just over 45 per cent of responses were only partially accurate.

I’ll come back to this later. First, it’s important to discuss responses that were wholly or partially incorrect.

Content errors

The worst mistake I found was undoubtedly made by Grok. The generative AI tool offered with X, Elon Musk’s social network, told me “asylum seekers [were] mistreated in Chibougamau” in northern Québec:

“About 20 asylum-seekers were sent from Montréal to Chibougamau, but most quickly returned due to inadequate conditions. They report being treated like ‘princes and princesses’ ironically, but in reality with a lack of support. The incident raises questions about the management of refugees in Québec.”

Grok based its comments on a La Presse article published that day. But it contorted the story. In fact, La Presse reported that the trip was a success. Of the 22 asylum-seekers, 19 got job offers in Chibougamau.

Other examples of inaccuracies:

When a toddler was found alive after a gruelling four-day search in June 2025, Grok erroneously claimed the child’s mother had abandoned her daughter along a highway in eastern Ontario “in order to go on vacation.” This was reported nowhere.
Aria told me that French cyclist Julian Alaphilippe had won the Grand Prix Cycliste de Montréal, an annual road bicycle race. This was untrue; Alaphilippe won a similar race in Québec City two days earlier. In Montréal, American Brandon McNulty won.
Grok also claimed that “the [provincial] Liberals maintain a stable lead” in a Léger poll. In fact, the Québec Liberal Party was in second place at the time; the Parti Quebecois was in the lead.

I also noticed many French-language spelling and grammatical mistakes. It’s possible there may have been fewer if I’d asked the tools to answer my queries in English.

I mentioned earlier that approximately 45 per cent of the responses I could verify were partially reliable. In those responses, I found a number of misinterpretations that, though erroneous, I could not classify as unreliable responses.

For example, Chinese AI tool DeepSeek told me that the “apple season in Québec” was “excellent.” The article on which it based this claim painted a more nuanced picture: “The season is not over yet,” said an orchard owner quoted in the article.

ChatGPT repeated the same odd phrasing two days in a row, writing that Mark Carney is “the most popular federal prime minister in Québec.” Of course, he’s the only one.

Generative conclusions

In most cases, I classified news items as “partially reliable” due to various conclusions drawn by generative AI tools.

For example, both Grok and ChatGPT picked up on a story about $2.3 million in emergency work to be done on the Pierre-Laporte Bridge in Québec City. Grok’s last sentence was: “This highlights the challenges of maintaining critical infrastructure in Québec.” ChatGPT, on the other hand, wrote that the news “highlights the conflict between budget constraints, planning, and public safety.”

None of this is incorrect; some might even find such contextualization helpful. Nevertheless, these conclusions were not supported by any source, and no one cited in the referenced articles was quoted saying so.

In another example, ChatGPT concluded that an accident north of Québec City “has reignited the debate on road safety in rural areas.” No such debate was reported in the article cited by the AI tool. To my knowledge, this debate does not exist.

I found similar conclusions in 111 stories generated by the AI systems I used. They often contained expressions such as “this situation highlights,” “reignites the debate,” “illustrates tensions,” or “raises questions.”

In no case did I find a human being mentioning the tensions or debates reported by the AI tools. These “generative conclusions” seem to create debates that do not exist, and could represent a misinformation risk.

Treading carefully

A few days after I published the French version of this story, a report by 22 public service media organizations was published with similar results.

The study found that “almost half of all AI answers had at least one significant issue, [that] a third of responses showed serious sourcing problems [and that] a fifth contained major accuracy issues, such as hallucinated and/or outdated information.”

When we ask for news, we should expect generative AI tools to stick to the facts. Because they don’t, anyone using AI as a source of reliable information should tread carefully.

Click here to look at the Google Sheets file in which the author recorded the responses given each morning by generative AI tools (in French)

The post “I used AI chatbots as a source of news for a month, and they were unreliable and erroneous” by Jean-Hugues Roy, Professeur, École des médias, Université du Québec à Montréal (UQAM) was published on 01/11/2026 by theconversation.com

Contents

Turning to AI chatbots for news

Dubious, sometimes imaginary sources

Content errors

Generative conclusions

Treading carefully