Wikipedia, the online encyclopedia, has teamed up with KTH Royal Institute of Technology to develop the world's first crowdsourced speech engine. The platform is aimed at providing access to Wikipedia and other wikis to users having reading difficulties or visually impaired.
The platform, which will be optimised for Wikipedia, would be available as open source and usable by any website that uses MediaWiKi – an open-source wiki package. Following the development of English, Swedish and Arabic speech engines by September 2017, the service would be extended for rest of the 280 languages Wikipedia is available for.
"Initially our focus will be on Swedish language, where we will make use of our own language resources," said Joakin Gustafson, professor of speech technology at KTH, "Then we will do a basic English voice, which we expect to be quite good given the large amount of open source linguistic resources. And finally, we will do a rudimentary Arabic voice, that will be more a proof of concept."
Very much like Wikipedia, the speech output in the speech engine would be crowdsourced, allowing users to contribute to the development of synthesizer. The content produced would be freely licensed and open to everyone in accordance with the rules of Wikimedia Commons.
The Wikispeech pilot project is a collaboration between KTH, the Swedish Post and Telecom Authority, Wikimedia Sweden and STTS speech technology services. Jonas Beskow, professor of speech communication at KTH and Zofia Malisz would lead the project.
The team has already conducted a pilot study and according to Wikimedia Sweden 25% of Wikipedia users, which is about 125 million people per month, prefer text in spoken form.
"We will build an open framework where any open source speech synthesizer can be plugged in. Since it is open source modules, it will also be possible to add or substitute certain modules in the Text-to-Speech system (TTS)," Gustafson told TechCrunch. "The TTS will be open source so anybody could use that functionality for any use – not only reading wiki (or other) web pages," he added.
Gustafson further said the group wants to explore possibility for allowing users record how a word should be pronounced and then have it automatically corrected the transcription. "In the first stage it the will have to use phonetic transcription (IPA) to correct the dictionary, but we will explore the possibility for a user to record how it should be pronounced and that automatically correct the transcription," said Gustafson.
"This is probably something we will do in the next project where we will extend the system to allow users to build their own voices. We will then have them read 30 minutes of text and then morph a voice (trained on 10 hours of speech) to sound like them."