If there are few speakers left of a language, how does a community revive it? In our current era, 3,000 languages are at risk of extinction due to the pressures of colonization, globalization, forced cultural assimilation, environmental devastation and other factors.
According to Canada’s Commission for Indigenous Languages, “research shows that no Indigenous language in Canada is safe and that all are in varying stages of endangerment.”
Our society is also being shaped by the rapid rise of artificial intelligence. Can AI be used for the benefit of Indigenous language survival in Canada and elsewhere?
According to the World Economic Forum, most AI chatbots are trained on 100 of the world’s 7,000 languages. English is the main driver of most large language models.
This scenario leaves the bulk of the world’s languages in the dust. In the coming years, will AI contribute to language revitalization, or language oppression?
A language in a box
In a 2023 TEDx talk, Northern Cheyenne computer engineer Michael Running Wolf shared his design of a cedar box that looks both ancient and contemporary. He described the dragonfly-adorned device as a “cedar-enclosed, offline Edge AI that contains the inner workings of a minimal voice-based language curricula — in other words, a language in a box.”
He proposed that conversational AI technology, much like Amazon Alexa or Google Home, could help language learners improve their fluency.
Running Wolf is the technical director of the First Languages AI Reality initiative at the Québec Institute for Artificial Intelligence. The program propels Indigenous scholars and technologists towards creating innovative solutions regarding language loss.
Voice-controlled tools trained via machine learning could serve as AI assistants for speakers who wish to hear unfamiliar sounds pronounced accurately, and practice their own pronunciation. This technology could establish a new means for facilitating oral transmission, which is crucial when there are few fluent speakers left.
At the heart of Running Wolf’s project is Indigenous data sovereignty, which ensures that Indigenous people retain control over their data.
A place in the digital world
Around the world in the Philippines, AI scholar and politician Anna Mae Yu Lamentillo is on a quest to support the Indigenous languages of her home country. She created NightOwlGPT, a new AI-powered translation app.
In an email to me, Lamentillo wrote:
“In the Philippines alone, we are working on nine languages, many of which are endangered. Our goal is to ensure that these languages — not just the dominant ones — have a place in the digital world.”
(Arwin Doloricon)
We have seen that in the hands of the powerful, AI software can lead to oppressive forms of control, such as excessive AI-powered surveillance by Amazon and the U.S. government’s unethical data mining tactics.
When it comes to the survival or extinction of languages, it is important to question the power behind AI tools. Who controls them, and who benefits from them?
When I asked about the democratization of AI, Lamentillo noted the need for inclusivity:
“AI’s rapid advancement could parallel historical patterns of colonization. If AI is truly a black swan event — a disruptive moment in history — then what happens when 99 per cent of languages are left behind? This is more than just a linguistic issue; it’s a serious matter of accessibility, representation and digital equity.
If we don’t change who is leading AI development, we risk creating a new form of colonization — one where only a small fraction of the world has the tools to thrive.”
Diversity of voices

(Emmanuel Ngué Um)
At a recent workshop series on endangered languages, Emmanuel Ngué Um, a professor of linguistics at the University of Yaoundé I in Cameroon, spoke on behalf of a research team of African linguists.
They are currently using Mozilla’s Common Voice platform to create open-source datasets containing thousands of words and audio recordings in 31 African languages.
The platform aims to make speech recognition and voice-based AI more inclusive by crowd-sourcing a massively multilingual speech corpus. But this process is not without significant challenges in Africa.
Ngué Um noted that building datasets for languages with many dialects is not straightforward. There may not be a standardized spelling or pronunciation that should be used by AI as the accepted norms for the language.
Because of postcolonial changes, many African languages do not have one unified or agreed-upon writing system. This issue can slow the creation of teaching tools, but many local efforts backed by UNESCO are underway to change this.
So, how do automatic speech recognition tools deal with dialectical diversity? And how do text-to-speech models handle competing writing systems?
As Ngué Um wrote in an email to me:
“AI has been instrumental in delivering services that applied linguists have promised but are slow to deliver. This is not due to a lack of will or means on the part of linguists, but rather, because of the linguistic reality in Africa.
Despite the impact of colonization and the imposition of a monolithic ideal on language reality, Africa reflects the plurality, fluidity and resourcefulness that drive human communication…If AI is informed by these intricacies at all phases of its implementation, it will adequately address the diversity of voices…in Africa.”
It is clear that AI engineers and computational linguists need to integrate thoughtful approaches that take into account unique circumstances of languages.
In the not-too-distant future, using AI tools to learn and communicate in under-resourced languages may become the norm. However, that shift depends on financial backing, accurate training data for machine learning, and community desire to embrace AI. Ultimately, data sovereignty and equitable access must be at the core of AI tools.

The post “How AI could help safeguard Indigenous languages” by Anna Luisa Daigneault, PhD Student in Linguistic Anthropology, Université de Montréal was published on 05/11/2025 by theconversation.com