Best practices for crowdsourcing dialectal arabic speech transcription

Samantha Wray, Hamdy Mubarak, Ahmed Ali

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Citations (Scopus)

Abstract

In this paper, we investigate different approaches in crowdsourcing transcriptions of Dialectal Arabic speech with automatic quality control to ensure good transcription at the source. Since Dialectal Arabic has no standard orthographic representation, it is very challenging to perform quality control. We propose a complete recipe for speech transcription quality control that includes using output of an Automatic Speech Recognition system. We evaluated the quality of the transcribed speech and through this recipe, we achieved a reduction in transcription error of 1.0% compared with 13.2% baseline with no quality control for Egyptian data, and down to 4% compared with 7.8% for the North African dialect.

Original languageEnglish
Title of host publication2nd Workshop on Arabic Natural Language Processing, ANLP 2015 - held at 53rd Annual Meeting of the Association for Computational Linguistics, ACL 2015 - Proceedings
EditorsNizar Habash, Stephan Vogel, Kareem Darwish
PublisherAssociation for Computational Linguistics (ACL)
Pages99-107
Number of pages9
ISBN (Electronic)9781941643587
DOIs
Publication statusPublished - 2015
Event2nd Workshop on Arabic Natural Language Processing, ANLP 2015 - Beijing, China
Duration: 30 Jul 2015 → …

Publication series

Name2nd Workshop on Arabic Natural Language Processing, ANLP 2015 - held at 53rd Annual Meeting of the Association for Computational Linguistics, ACL 2015 - Proceedings

Conference

Conference2nd Workshop on Arabic Natural Language Processing, ANLP 2015
Country/TerritoryChina
CityBeijing
Period30/07/15 → …

Fingerprint

Dive into the research topics of 'Best practices for crowdsourcing dialectal arabic speech transcription'. Together they form a unique fingerprint.

Cite this