NatiQ: An End-to-end Text-to-Speech System for Arabic

Ahmed Abdelali, Nadir Durrani, Cenk Demiroglu, Fahim Dalvi, Hamdy Mubarak, Kareem Darwish

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Citations (Scopus)

Abstract

NatiQ is end-to-end text-to-speech system for Arabic. Our speech synthesizer uses an encoder-decoder architecture with attention. We used both tacotron-based models (tacotron-1 and tacotron-2) and the faster transformer model for generating mel-spectrograms from characters. We concatenated Tacotron1 with the WaveRNN vocoder, Tacotron2 with the WaveGlow vocoder and ESPnet transformer with the parallel wavegan vocoder to synthesize waveforms from the spectrograms. We used in-house speech data for two voices: 1) neutral male “Hamza”- narrating general content and news, and 2) expressive female “Amina”narrating children story books to train our models. Our best systems achieve an average Mean Opinion Score (MOS) of 4.21 and 4.40 for Amina and Hamza respectively.The objective evaluation of the systems using word and character error rate (WER and CER) as well as the response time measured by real-time factor favored the end-to-end architecture ESPnet.NatiQ demo is available online at https://tts.qcri.org.

Original languageEnglish
Title of host publicationWANLP 2022 - 7th Arabic Natural Language Processing - Proceedings of the Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages394-398
Number of pages5
ISBN (Electronic)9781959429272
Publication statusPublished - 2022
Event7th Arabic Natural Language Processing Workshop, WANLP 2022 held with EMNLP 2022 - Abu Dhabi, United Arab Emirates
Duration: 8 Dec 2022 → …

Publication series

NameWANLP 2022 - 7th Arabic Natural Language Processing - Proceedings of the Workshop

Conference

Conference7th Arabic Natural Language Processing Workshop, WANLP 2022 held with EMNLP 2022
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period8/12/22 → …

Fingerprint

Dive into the research topics of 'NatiQ: An End-to-end Text-to-Speech System for Arabic'. Together they form a unique fingerprint.

Cite this