Critical Survey of the Freely Available Arabic Corpora

Wajdi Zaghouani

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The availability of corpora is a major factor in building natural language processing applications. However, the costs of acquiring corpora can prevent some researchers from going further in their endeavours. The ease of access to freely available corpora is urgent needed in the NLP research community especially for language such as Arabic. Currently, there is not easy was to access to a comprehensive and updated list of freely available Arabic corpora. We present in this paper, the results of a recent survey conducted to identify the list of the freely available Arabic corpora and language resources. Our preliminary results showed an initial list of 66 sources. We presents our findings in the various categories studied and we provided the direct links to get the data when possible.
Original languageEnglish
Title of host publicationLrec 2014 - Ninth International Conference On Language Resources And Evaluation
EditorsN Calzolari, K Choukri, T Declerck, H Loftsson, B Maegaard, J Mariani, A Moreno, J Odijk, S Piperidis
PublisherEuropean Language Resources Assoc-Elra
Number of pages8
ISBN (Electronic)978-2-9517408-8-4
Publication statusPublished - 2014
Event9th International Conference on Language Resources and Evaluation (LREC) - Reykjavik, Iceland
Duration: 26 May 201431 May 2014

Conference

Conference9th International Conference on Language Resources and Evaluation (LREC)
Country/TerritoryIceland
CityReykjavik
Period26/05/1431/05/14

Keywords

  • Arabic
  • Corpora
  • Corpus
  • Free
  • Open source
  • Survey

Fingerprint

Dive into the research topics of 'Critical Survey of the Freely Available Arabic Corpora'. Together they form a unique fingerprint.

Cite this