TY - GEN
T1 - Arabic retrieval revisited
T2 - 50th Annual Meeting of the Association for Computational Linguistics, ACL 2012
AU - Darwish, Kareem
AU - Ali, Ahmed M.
PY - 2012
Y1 - 2012
N2 - Due to Arabic's morphological complexity, Arabic retrieval benefits greatly from morphological analysis - particularly stemming. However, the best known stemming does not handle linguistic phenomena such as broken plurals and malformed stems. In this paper we propose a model of character-level morphological transformation that is trained using Wikipedia hypertext to page title links. The use of our model yields statistically significant improvements in Arabic retrieval over the use of the best statistical stemming technique. The technique can potentially be applied to other languages.
AB - Due to Arabic's morphological complexity, Arabic retrieval benefits greatly from morphological analysis - particularly stemming. However, the best known stemming does not handle linguistic phenomena such as broken plurals and malformed stems. In this paper we propose a model of character-level morphological transformation that is trained using Wikipedia hypertext to page title links. The use of our model yields statistically significant improvements in Arabic retrieval over the use of the best statistical stemming technique. The technique can potentially be applied to other languages.
UR - http://www.scopus.com/inward/record.url?scp=84878206852&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84878206852
SN - 9781937284251
T3 - 50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference
SP - 218
EP - 222
BT - 50th Annual Meeting of the Association for Computational Linguistics, ACL 2012 - Proceedings of the Conference
Y2 - 8 July 2012 through 14 July 2012
ER -