Artificial intelligence is creating a problematic feedback loop that endangers vulnerable languages by relying on Wikipedia data, which is increasingly populated with inaccurate AI-generated translations. This ‘linguistic doom loop’ occurs because AI systems learn from these flawed translations on Wikipedia, further perpetuating and amplifying errors. This is particularly detrimental to languages with fewer speakers, where Wikipedia often serves as the primary source of online linguistic data.
Reports indicate that on some African language versions of Wikipedia, a significant portion – estimated between 40-60% – of articles are uncorrected machine translations. This issue is compounded by individuals using tools like Google Translate to create large volumes of content without proper linguistic understanding, effectively ‘hijacking’ these language Wikipedias. The cumulative effect poses a risk to the survival of these languages, potentially discouraging younger generations through exposure to substandard translations.
While some communities, such as the Inari Saami, are prioritizing quality control with meticulous copy-editing of each article, the broader situation is concerning. Jacob Judah, an investigative journalist, highlights the potential long-term damage caused by the spread of AI-generated linguistic errors on platforms like Wikipedia, particularly for already marginalized languages.