AI often mangles African languages. Local scientists and volunteers are taking it back to school


Article by Sandeep Ravindran: “Imagine joyfully announcing to your Facebook friends that your wife gave birth, and having Facebook automatically translate your words to “my prostitute gave birth.” Shamsuddeen Hassan Muhammad, a computer science Ph.D. student at the University of Porto, says that’s what happened to a friend when Facebook’s English translation mangled the nativity news he shared in his native language, Hausa.

Such errors in artificial intelligence (AI) translation are common with African languages. AI may be increasingly ubiquitous, but if you’re from the Global South, it probably doesn’t speak your language.

That means Google Translate isn’t much help, and speech recognition tools such as Siri or Alexa can’t understand you. All of these services rely on a field of AI known as natural language processing (NLP), which allows AI to “understand” a language. The overwhelming majority of the world’s 7000 or so languages lack data, tools, or techniques for NLP, making them “low-resourced,” in contrast with a handful of “high-resourced” languages such as English, French, German, Spanish, and Chinese.

Hausa is the second most spoken African language, with an estimated 60 million to 80 million speakers, and it’s just one of more than 2000 African languages that are mostly absent from AI research and products. The few products available don’t work as well as those for English, notes Graham Neubig, an NLP researcher at Carnegie Mellon University. “It’s not the people who speak the languages making the technology.” More often the technology simply doesn’t exist. “For example, now you cannot talk to Siri in Hausa, because there is no data set to train Siri,” Muhammad says.

He is trying to fill that gap with a project he co-founded called HausaNLP, one of several launched within the past few years to develop AI tools for African languages…(More)”.