TY - GEN
T1 - Multi-reference WER for evaluating ASR for languages with no orthographic rules
AU - Ali, Ahmed
AU - Magdy, Walid
AU - Bell, Peter
AU - Renais, Steve
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/2/10
Y1 - 2016/2/10
N2 - Languages with no standard orthographic representation faces a challenge to evaluate the output from Automatic Speech Recognition (ASR). Since the reference transcription text can vary widely from one user to another. We propose an innovative approach for evaluating speech recognition using Multi-References. For each recognized speech segments, we ask five different users to transcribe the speech. We combine the alignment for the multiple references, and use the combined alignment to report a modified version of Word Error Rate (WER). This approach is in favor of accepting a recognized word if any of the references typed it in the same form. Results are reported using two Dialectal Arabic (DA) as a language with no standard orthographic; Egyptian, and North African speech. The average WER for the five references individually is 71.4%, and 80.1% respectively. When considering all references combined, the Multi-References MR-WER was found to be 39.7%, and 45.9% respectively.
AB - Languages with no standard orthographic representation faces a challenge to evaluate the output from Automatic Speech Recognition (ASR). Since the reference transcription text can vary widely from one user to another. We propose an innovative approach for evaluating speech recognition using Multi-References. For each recognized speech segments, we ask five different users to transcribe the speech. We combine the alignment for the multiple references, and use the combined alignment to report a modified version of Word Error Rate (WER). This approach is in favor of accepting a recognized word if any of the references typed it in the same form. Results are reported using two Dialectal Arabic (DA) as a language with no standard orthographic; Egyptian, and North African speech. The average WER for the five references individually is 71.4%, and 80.1% respectively. When considering all references combined, the Multi-References MR-WER was found to be 39.7%, and 45.9% respectively.
KW - Under-Resource
KW - WER
UR - http://www.scopus.com/inward/record.url?scp=84964440067&partnerID=8YFLogxK
U2 - 10.1109/ASRU.2015.7404847
DO - 10.1109/ASRU.2015.7404847
M3 - Conference contribution
AN - SCOPUS:84964440067
T3 - 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings
SP - 576
EP - 580
BT - 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015
Y2 - 13 December 2015 through 17 December 2015
ER -