Although Africa is home to a huge portion of the world's languages – well over a quarter according to some estimates - many are missing when it comes to the development of artificial intelligence (AI).

This is both an issue of a lack of investment and readily available data.

Most AI tools, such as ChatGPT, used today are trained on English as well as other European and Chinese languages.

These have vast quantities of online text to draw from.

But as many African languages are mostly spoken rather than written down, there is a lack of text to train AI on to make it useful for speakers of those languages.

For millions across the continent, this means being left out.

Researchers who have been trying to address this issue have recently released what is thought to be the largest known dataset of African languages.

We think in our own languages, dream in them and interpret the world through them. If technology doesn’t reflect that, a whole group risks being left behind, says Prof Vukosi Marivate from the University of Pretoria.

The African Next Voices project brought together linguists and computer scientists to create AI-ready datasets in 18 African languages. While this may just be a small portion of the more than 2,000 languages estimated to be spoken across the continent, the project aims to expand in the future.

Within two years, the team recorded 9,000 hours of speech across Kenya, Nigeria, and South Africa, capturing everyday scenarios in farming, health, and education.

Using the newly created datasets, apps like AI-Farmer assist local farmers such as Kelebogile Mosime by providing information in their native languages.

As someone still learning to farm, you face a lot of challenges,” says Mosime. Daily, I see the benefits of being able to use my home language on the app when I run into problems on the farm.

The initiative, funded by a $2.2 million grant from the Gates Foundation, aims to create inclusive AI technologies that allow African communities to utilize technologies beneficially.

Overall, these efforts reflect an understanding that language is not just communication; it's a way of knowing and understanding the world.