Query term expansion by automatic learning of morphological equivalence patterns from Wikipedia

Kareem Darwish, Ahmed M. Ali, Ahmed Abdelali

Research output: Contribution to journalConference articlepeer-review

1 Citation (Scopus)

Abstract

Retrieval in many languages would benefit from languagespecific processing, such as stemming or morphological analysis. However, many languages lack such processing tools, or they may be inadequate for retrieval due to language evolution. In this paper, we explore the use of Wikipedia redirects to automatically learn morphological equivalence patterns. Character-level alignment of automatically found morphological variants from Wikipedia redirects is used to generate character-level transformations. Then, given a query word, character-level transformations are used to produce morphological equivalents. The proposed method is language independent and can be applied to new languages without need for linguistic knowledge. Though, the performance of this approach may in the aggregate lag behind state-of-the-art stemming (or morphological analysis) for languages with good existing processors, the approach is generally safer than stemming in the sense that if it degrades queries, the degradation is generally marginal. Stemming on the other hand can significantly degrade queries. We show its success for Arabic, English, Hungarian, and Portuguese.

Original languageEnglish
Pages (from-to)24-29
Number of pages6
JournalCEUR Workshop Proceedings
Volume1204
Publication statusPublished - 2014
EventWorkshop on Semantic Matching in Information Retrieval, SMIR 2014 - Gold Coast, Australia
Duration: 11 Jul 201411 Jul 2014

Keywords

  • Inflection
  • Information Retrieval
  • Morphological Analysis
  • Query Expansion

Fingerprint

Dive into the research topics of 'Query term expansion by automatic learning of morphological equivalence patterns from Wikipedia'. Together they form a unique fingerprint.

Cite this