Munazarat 1.0: A Corpus of Arabic Competitive Debates

Mohammad Majed Khader, Abdul Gabbar Al-Sharafi, Mohamad Hamza Al-Sioufy, Wajdi Zaghouani, Ali Al-Zawqari

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

This paper introduces the Corpus of Arabic Competitive Debates, Munazarat. Despite the significance of competitive debating in fostering critical thinking and promoting dialogue, researchers in the fields of Arabic Natural Language Processing (NLP), linguistics, argumentation studies, and education have limited access to datasets on competitive debating. At this stage of the study, we introduce Munazarat 1.0, which combines transcribed recordings of approximately 50 hours from 73 debates at QatarDebate-recognized tournaments, all available on YouTube. Munazarat is a novel specialized Arabic speech corpus, predominantly in Modern Standard Arabic (MSA), covering diverse debating topics and accompanied by metadata for each debate. The transcription of debates was performed using Fenek, a speech-to-text Kanari AI tool, and reviewed by three native Arabic speakers to enhance quality. The Munazarat 1.0 dataset can serve as a valuable resource for training Arabic NLP tools, developing argumentation mining machines, and analyzing Arabic argumentation and rhetoric styles.

Original languageEnglish
Title of host publication6th Workshop on Open-Source Arabic Corpora and Processing Tools, OSACT 2024 with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation at LREC-COLING 2024 - Workshop Proceedings
EditorsHend Al-Khalifa, Kareem Darwish, Hamdy Mubarak, Mona Ali, Tamer Elsayed
PublisherEuropean Language Resources Association (ELRA)
Pages20-30
Number of pages11
ISBN (Electronic)9782493814364
Publication statusPublished - 2024
Event6th Workshop on Open-Source Arabic Corpora and Processing Tools, OSACT 2024 with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation - Torino, Italy
Duration: 25 May 2024 → …

Publication series

Name6th Workshop on Open-Source Arabic Corpora and Processing Tools, OSACT 2024 with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation at LREC-COLING 2024 - Workshop Proceedings

Conference

Conference6th Workshop on Open-Source Arabic Corpora and Processing Tools, OSACT 2024 with Shared Tasks on Arabic LLMs Hallucination and Dialect to MSA Machine Translation
Country/TerritoryItaly
CityTorino
Period25/05/24 → …

Keywords

  • Arabic Speech Corpus
  • Debates
  • Modern Standard Arabic

Fingerprint

Dive into the research topics of 'Munazarat 1.0: A Corpus of Arabic Competitive Debates'. Together they form a unique fingerprint.

Cite this