An AI of Our Own has completed our second workshop as part of the Mozilla Common Voice API project that we have been working on since the beginning of the year. The project aims to create a mobile-first, offline-capable application allowing Dagbani and Khmer-speaking communities with limited digital access to contribute their voices to Mozilla’s Common Voice platform. This mobile, offline capability is a first for the platform and allows for broader inclusion and representation in the language datasets. The application, entitled “Voices of Our Own,” is in process, and recording will soon begin for the more than 5000 sentences gathered in both languages – sentences particularly related to arts and culture, building a strong database of culturally relevant recordings. This app will help people in low-bandwidth and rural communities contribute voice data to Mozilla Common Voice and ensure underrepresented languages are part of the AI future.
However, in addition to the technological development, the project has included two community workshops, one in Ghana at the end of January and the other in Cambodia recently at the beginning of April. The Ghana workshop was organized by the Dagbani Wikimedians User Group and featured a 2-day workshop in Tamale, Ghana, aimed at training volunteers to annotate sentences on Mozilla Common Voice. 15 participants were chosen from that workshop to move ahead with annotations and recordings. (Read more about the workshop here)
In Cambodia, 28 participants from Digital Divide Data (DDD) and students from Bophana Center‘s documentary film program, including members of Indigenous communities including the Jarai, Kuy and Tampuon, came together to learn about open-source speech datasets, Mozilla Common Voice, and annotating language data.
The workshop was facilitated by Sophea Sok and CHY Sophat, both of whom are highly experienced in open-source data and technology and have been working to add Khmer to Mozilla for some time. They helped in the first phase of the project in officially getting the Khmer language added to Mozilla Common Voice.
In addition to learning about the Common Voice platform, participants received hands-on-training on how to use it to annotate Khmer language data. The workshop highlighted the platform’s role in the preservation, revitalization, and elevation of Khmer language, through sharing, creating and curating text and speech datasets.
The final step of this project is to complete the recordings with annotations and launch the application, and should be completed by the end of April, 2026.