TY - GEN
T1 - Adapting a resource-light highly multilingual Named Entity Recognition system to Arabic
AU - Zaghouani, Wajdi
AU - Pouliquen, Bruno
AU - Ebrahim, Mohamed
AU - Steinberger, Ralf
PY - 2010
Y1 - 2010
N2 - We present a working Arabic information extraction (IE) system that is used to analyze large volumes of news texts every day to extract the named entity (NE) types person, organization, location, date and number, as well as quotations (direct reported speech) by and about people. The Named Entity Recognition (NER) system was not developed for Arabic, but - instead - a highly multilingual, almost language-independent NER system was adapted to also cover Arabic. The Semitic language Arabic substantially differs from the Indo-European and Finno-Ugric languages currently covered. This paper thus describes what Arabic language-specific resources had to be developed and what changes needed to be made to the otherwise language-independent rule set in order to be applicable to the Arabic language. The achieved evaluation results are generally satisfactory, but could be improved for certain entity types.
AB - We present a working Arabic information extraction (IE) system that is used to analyze large volumes of news texts every day to extract the named entity (NE) types person, organization, location, date and number, as well as quotations (direct reported speech) by and about people. The Named Entity Recognition (NER) system was not developed for Arabic, but - instead - a highly multilingual, almost language-independent NER system was adapted to also cover Arabic. The Semitic language Arabic substantially differs from the Indo-European and Finno-Ugric languages currently covered. This paper thus describes what Arabic language-specific resources had to be developed and what changes needed to be made to the otherwise language-independent rule set in order to be applicable to the Arabic language. The achieved evaluation results are generally satisfactory, but could be improved for certain entity types.
UR - http://www.scopus.com/inward/record.url?scp=84977279466&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84977279466
T3 - Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
SP - 563
EP - 567
BT - Proceedings of the 7th International Conference on Language Resources and Evaluation, LREC 2010
A2 - Tapias, Daniel
A2 - Russo, Irene
A2 - Hamon, Olivier
A2 - Piperidis, Stelios
A2 - Calzolari, Nicoletta
A2 - Choukri, Khalid
A2 - Mariani, Joseph
A2 - Mazo, Helene
A2 - Maegaard, Bente
A2 - Odijk, Jan
A2 - Rosner, Mike
PB - European Language Resources Association (ELRA)
T2 - 7th International Conference on Language Resources and Evaluation, LREC 2010
Y2 - 17 May 2010 through 23 May 2010
ER -