On Mozilla Common Voice, as of December 2021, there are 154 locales, but only 87 fulfilled the requirements to collect voices, where 27 of them are fairly new. In this two-part presentation, we want to give some starting points for the new language communities, share our accumulated knowledge in the last year while working on the under-resourced Turkish language, with initial training results.
The presentation includes the following topics: Resources on Mozilla Common Voice, how to analyze your dataset, how to set goals, how to design a social media campaign, what tools you can use, Google Colabs, Coqui STT, and our roundups on training Common Voice Turkish Dataset v1 - v7.0, all with our successes and failures as Common Voice Turkish Volunteers group as lessons learned.
Speakers: Bülent Özden