Shaping AI in Africa: Power, Practice and Possibility

Panel Host: Masakhane

AI & Equality Festival of Ideas 2026

This panel brings together voices from across the Masakhane ecosystem to take an honest look at where things stand and where they need to go. 


The conversation moves from the ground up: data collection, community ownership, and the ethical questions that come with both, through the policy frameworks emerging at national and continental levels, to the landscape of African languages and the models being built to serve them.

Watch recording 

Speakers: 

Facilitated by Leyla Roksan Caglar, Postdoctoral Fellow in the Windreich Department for AI & Human Health at the Icahn School of Medicine at Mount Sinai, and AI & Equality Festival Co-organiser

When we call a language low-resourced, who decided that?

The Dholuo language is spoken by millions. Hausa is one of the most widely spoken languages in the world. These are not low-resourced languages. They are under-resourced  and that is a political fact. That distinction opened the panel hosted by Masakhane at the AI & Equality Festival of Ideas, and it held for the full ninety minutes that followed.

Moderated by Lydia Taban, the conversation brought together four researchers and builders working at the intersection of African languages and AI: Bonaventure Dossou (McGill University and Mila), Deborah Kanubala (Saarland University and Parité Ethical AI Labs), Tajuddeen Gwadabe (Masakhane African Languages Hub), and Lilian Wanzare (Maseno University and the KenCorpus project). Together they offered an account of what it actually takes to build language technology for communities that global AI systems have largely ignored — and what it costs in time, trust, and invisible labour that rarely makes it into academic papers.

 
“It’s not just going to get data. It’s having the communities be partners in building this data and building the tools. That gives it a more meaningful sense to the involvement rather than ‘give me your voice and forget about it.’” — Lilian Wanzare, Maseno University

Dr. Lilian Wanzare described the work of building the KenCorpus project and the Kenyan Sign Language dataset as fundamentally relational. Trust with communities takes years to build and requires working through people of influence within each language community, people who understand the culture, the generational dynamics, and how to extend an invitation that feels genuine. For the sign language dataset, that meant stepping back entirely and allowing deaf community members to lead the engagement with their own community. The researcher’s job was to support, not to front.

Deborah Kanubala brought a fairness and ethics lens to the question of what these models are actually optimising for. Her work on NLP for Ghanaian languages surfaced the depth of what is at stake when idiomatic expressions, dialect variations, and local meanings are flattened by systems trained elsewhere. The invisible labour of transcription and annotation — done by people who understand not just the words but the meaning behind them — is rarely accounted for in the resources that fund or publish this work.

 
“Whilst building models, let’s always stop to ask ourselves: what happens, or who gets harmed, if it fails?” — Deborah Kanubala, Parité Ethical AI Labs

The panel’s most charged exchange came around data ownership. Gwadabe walked through the structural tensions that arise when funders require open licensing in the name of innovation, while communities bear the risk of extraction. He pointed to the Noodle License and similar frameworks being developed to give communities genuine governance over how their data is used — including the ability to require benefit-sharing when commercial actors want access. Wanzare went further, drawing a distinction between the language community (all speakers) and the data community (those who actively contributed to a specific resource), arguing that both deserve a say in how that resource is governed going forward.

Bonaventure Dossou offered a different angle. Rather than organising his research around the data scarcity problem, he has moved toward building models that treat constraints as a design condition rather than a deficit. His work on resource-efficient machine learning and active learning asks how much less data a well-designed model actually needs — and what it means to build architectures that are African-centric from the ground up, rather than fine-tuned from systems built for other realities.

 
“As Africans, we should embrace our limitations. If someone has the freedom to do whatever they want and you have constraints, those constraints should force us to think differently, do things more optimally, in a smarter way.” — Bonaventure Dossou, McGill University and Mila

The panel closed with each speaker offering a vision for 2036. The futures they described were specific: African-centric model architectures, federated compute infrastructure shared across universities, an African chapter of the ACL, data sovereignty frameworks with teeth, and most fundamentally, a shift in how African communities think about their own languages as daily, living, generative resources in the digital space. The question Dr. Wanzare left the room with was the sharpest: why are we under-resourced? If we can really answer that, she said, we will not have to worry about NLP.

Explore other highlights from the festival

 

In May 2026, the AI & Equality community held its inaugural Festival of Ideas: a free, global, one-day gathering that ran across every time zone, a full programme of 90-minute sessions bringing together researchers, organisers, technologists, and communities doing the most urgent work at the intersection of algorithmic systems and human rights.

By joining our community you can take part in these events, stay up to date with advancements on topics related to AI & Equality themes and connect with people from all over the world who are committed to building technology for a better future.