The Poio project develops language technologies to support communication in lesser-used and under-resourced languages on and with electronic devices. Within the Poio project we develop text input services with text prediction and transliteration for mobile devices and desktop users to allow conversation between individuals and in online communities.
In this lightning talk I will present the current architecture of the Poio Corpus, our corpus collection and data management pipeline. I will show how to add a new language to the corpus and how you can use the pipeline to build language models for the predictive text technology. Our goal is to make collaboration with language communities as smoothless as possible, so that developers, data engineers and speakers of under-ressourced language can collaborate to build grassroots language technologies. Poio started as a language revitalization project at the Interdisciplinary Centre for Social and Language Documentation in Minde/Portugal, a non-profit organization dedicated to the documentation and preservation of linguistic heritage.