TY - GEN
T1 - Munazarat 1.0
T2 - 6th Workshop on Open-Source Arabic Corpora and Processing Tools, OSACT 2024 with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation
AU - Khader, Mohammad Majed
AU - Al-Sharafi, Abdul Gabbar
AU - Al-Sioufy, Mohamad Hamza
AU - Zaghouani, Wajdi
AU - Al-Zawqari, Ali
N1 - Publisher Copyright:
© 2024 ELRA Language Resource Association.
PY - 2024
Y1 - 2024
N2 - This paper introduces the Corpus of Arabic Competitive Debates, Munazarat. Despite the significance of competitive debating in fostering critical thinking and promoting dialogue, researchers in the fields of Arabic Natural Language Processing (NLP), linguistics, argumentation studies, and education have limited access to datasets on competitive debating. At this stage of the study, we introduce Munazarat 1.0, which combines transcribed recordings of approximately 50 hours from 73 debates at QatarDebate-recognized tournaments, all available on YouTube. Munazarat is a novel specialized Arabic speech corpus, predominantly in Modern Standard Arabic (MSA), covering diverse debating topics and accompanied by metadata for each debate. The transcription of debates was performed using Fenek, a speech-to-text Kanari AI tool, and reviewed by three native Arabic speakers to enhance quality. The Munazarat 1.0 dataset can serve as a valuable resource for training Arabic NLP tools, developing argumentation mining machines, and analyzing Arabic argumentation and rhetoric styles.
AB - This paper introduces the Corpus of Arabic Competitive Debates, Munazarat. Despite the significance of competitive debating in fostering critical thinking and promoting dialogue, researchers in the fields of Arabic Natural Language Processing (NLP), linguistics, argumentation studies, and education have limited access to datasets on competitive debating. At this stage of the study, we introduce Munazarat 1.0, which combines transcribed recordings of approximately 50 hours from 73 debates at QatarDebate-recognized tournaments, all available on YouTube. Munazarat is a novel specialized Arabic speech corpus, predominantly in Modern Standard Arabic (MSA), covering diverse debating topics and accompanied by metadata for each debate. The transcription of debates was performed using Fenek, a speech-to-text Kanari AI tool, and reviewed by three native Arabic speakers to enhance quality. The Munazarat 1.0 dataset can serve as a valuable resource for training Arabic NLP tools, developing argumentation mining machines, and analyzing Arabic argumentation and rhetoric styles.
KW - Arabic Speech Corpus
KW - Debates
KW - Modern Standard Arabic
UR - http://www.scopus.com/inward/record.url?scp=85195397223&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85195397223
T3 - 6th Workshop on Open-Source Arabic Corpora and Processing Tools, OSACT 2024 with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation at LREC-COLING 2024 - Workshop Proceedings
SP - 20
EP - 30
BT - 6th Workshop on Open-Source Arabic Corpora and Processing Tools, OSACT 2024 with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation at LREC-COLING 2024 - Workshop Proceedings
A2 - Al-Khalifa, Hend
A2 - Darwish, Kareem
A2 - Mubarak, Hamdy
A2 - Ali, Mona
A2 - Elsayed, Tamer
PB - European Language Resources Association (ELRA)
Y2 - 25 May 2024
ER -