Abstract
Retrieval in many languages would benefit from languagespecific processing, such as stemming or morphological analysis. However, many languages lack such processing tools, or they may be inadequate for retrieval due to language evolution. In this paper, we explore the use of Wikipedia redirects to automatically learn morphological equivalence patterns. Character-level alignment of automatically found morphological variants from Wikipedia redirects is used to generate character-level transformations. Then, given a query word, character-level transformations are used to produce morphological equivalents. The proposed method is language independent and can be applied to new languages without need for linguistic knowledge. Though, the performance of this approach may in the aggregate lag behind state-of-the-art stemming (or morphological analysis) for languages with good existing processors, the approach is generally safer than stemming in the sense that if it degrades queries, the degradation is generally marginal. Stemming on the other hand can significantly degrade queries. We show its success for Arabic, English, Hungarian, and Portuguese.
Original language | English |
---|---|
Pages (from-to) | 24-29 |
Number of pages | 6 |
Journal | CEUR Workshop Proceedings |
Volume | 1204 |
Publication status | Published - 2014 |
Event | Workshop on Semantic Matching in Information Retrieval, SMIR 2014 - Gold Coast, Australia Duration: 11 Jul 2014 → 11 Jul 2014 |
Keywords
- Inflection
- Information Retrieval
- Morphological Analysis
- Query Expansion