AI Translations Are Adding ‘Hallucinations’ to Wikipedia Articles

Wikipedia editors have implemented new policies and restricted a number of contributors who were paid to use AI to translate existing Wikipedia articles into other languages after they discovered these AI translations added AI “hallucinations,” or errors, to the resulting article.

The new restrictions show how Wikipedia editors continue to fight the flood of generative AI across the internet from diminishing the reliability of the world’s largest repository of knowledge. The incident also reveals how even well-intentioned efforts to expand Wikipedia are prone to errors when they rely on generative AI, and how they’re remedied by Wikipedia’s open governance model.

The issue in this case starts with an organization called the Open Knowledge Association (OKA), a non-profit organization dedicated to improving Wikipedia and other open platforms.

“We do so by providing monthly stipends to full-time contributors and translators,” OKA’s site says. “We leverage AI (Large Language Models) to automate most of the work.”

The problem is that editors started to notice that some of these translations introduced errors to articles. For example, a draft translation for a Wikipedia article about the French royal La Bourdonnaye family cites a book and specific page number when discussing the origin of the family. A Wikipedia editor, Ilyas Lebleu, who goes by Chaotic Enby on Wikipedia, checked that source and found that the specific page of that book “doesn’t talk about the La Bourdonnaye family at all.”

“To measure the rate of error, I actually decided to do a spot-check, during the discussion, of the first few translations that were listed, and already spotted a few errors there, so it isn’t just a matter of cherry-picked cases,” Lebleu told me. “Some of the articles had swapped sources or added unsourced sentences with no explanation, while 1879 French Senate election added paragraphs sourced from material completely unrelated to what was written!”

As Wikipedia editors looked at more OKA-translated articles, they found more issues.

“Many of the results are very problematic, with a large number of […] editors who clearly have very poor English, don’t read through their work (or are incapable of seeing problems) and don’t add links and so on,” a Wikipedia page discussing the OKA translation said. The same Wikipedia page also notes that in some cases the copy/paste nature of OKA translators’ work breaks the formatting on some articles.

Wikipedia editors investigated how OKA was operating and found that it was mostly relying on cheap labor from contractors in the Global South, and that these contractors were instructed to copy/paste articles to popular LLMs to produce translations.

For example, a public spreadsheet used by OKA translators to keep track of what articles they’re translating instructs them to “pick an article, copy the lead section into Gemini or chatGPT, then review if some of the suggestions are an improvement to readability. Make edits to the Wiki articles only if the suggestions are an improvement and don’t change the meaning of the lead. Do not change the content unless you have checked that what Gemini says is correct!”

Lebleu told me, and other editors have noted in their public on-site discussion of the issue, that these same instructions previously told OKA translators to use Grok, Elon Musk’s LLM, for the same purpose. Grok, which also produces an entirely automated alternative to Wikipedia called Grokepedia, is prone to errors precisely because it does not use humans to vet its output.

“The use of Grok proved controversial, notably given the reasons for which Grok has been in the news recently, and a recent in-house study showed ChatGPT and Claude perform more accurately, leading them to switch a few days ago, although they still recommend Grok as ‘valuable for experienced editors handling complex, template-heavy articles,’” Lebleu told me.

Ultimately the editors decided to implement restrictions against OKA translators who make multiple errors, but not block OKA translation as a rule.

“OKA translators who have received, within six months, four (correctly applied) warnings about content that fails verification will be blocked without further warning if another example is found,” the Wikipedia editors wrote. “Content added by an OKA translator who is subsequently blocked for failing verification may be presumptively deleted […] unless an editor in good standing is willing to take responsibility for it.”

A job posting for a “Wikipedia Translator” from OKA offers $397 a month for working up to 40 hours per week. The job listing says translators are expected to publish “5-20 articles per week (depending on size).”

“They leverage machine translation to accelerate the process. We have published over 1500 articles and the number grows every day,” the job posting says.

“Given this precarious status, I am worried that more uncertainty in the translator duties may lead to an overloading of responsibilities, which is worrying as independent contractors do not necessarily have the same protections as paid employees,” Lebleu wrote in the public Wikipedia discussion about OKA.

Jonathan Zimmermann, the founder and president of OKA, and who goes by 7804j

on Wikipedia, told me that translators are paid hourly, not per article, and that there is no fixed article quota.

“We emphasize quality over speed,” Zimmerman told me in an email. “In fact, some of the problematic cases involved unusually high output relative to time spent — which in retrospect was a warning sign. Those cases were driven by individual enthusiasm and speed rather than institutional pressure.”

Zimmerman told me that “errors absolutely do occur,” but that OKA’s process includes human review, requires translators to check their content against cited sources, and that “senior editors periodically review samples, especially from newer translators.”

“Following the recent discussion, we have strengthened our safeguards,” Zimmerman told me. “We are now rolling out a second, independent LLM review step. Translators must run the completed draft through a separate model using a dedicated comparison prompt designed to identify potential discrepancies, omissions, or inaccuracies relative to the source text. Initial findings suggest this is highly effective at detecting potential issues.”

Zimmerman added that if this method proves insufficient, OKA is considering introducing formal peer review mechanisms

Using AI to check the output of AI for errors is a method that is historically prone to errors. For example, we recently reported on an AI-powered private school that used AI to check AI-generated questions for students. Internal testing found it had at least a 10 percent failure rate.

“I agree that using AI to check AI can absolutely fail — and in some contexts it can fail at very high rates. We’re not assuming the secondary model is reliable in isolation,” Zimmerman said. “The key point is that we’re not replacing human verification with automated verification. The second model is a complement to manual review, not a substitute for it.”

“When a coordinated project uses AI tools and operates at scale, it’s going to attract attention. I understand why editors would examine that closely. Ultimately, the outcome of the discussion formalized expectations that are largely aligned with our existing internal policies,” Zimmerman added. “However, these restrictions apply specifically to OKA translators. I would prefer that standards apply equally to everyone, but I also recognize that organized, funded efforts are often held to a higher bar.”

Related Posts