A comparison of dimensionality reduction techniques for text retrieval

Vishwa Vinay*, Ingemar J. Cox, Ken Wood, Natasa Milic-Frayling

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

14 Citations (Scopus)

Abstract

The growth of digital information increases the need to build better techniques for automatically storing, organizing and retrieving it. Much of this information is textual in nature and existing representation models struggle to deal with the high dimensionality of the resulting feature space. Techniques like Latent Semantic Indexing address, to some degree, the problem of high dimensionality in information retrieval. However, promising alternatives, like Random Mapping (RM), have yet to be completely studied in this context. In this paper, we show that despite the attention RM has received in other applications, in the case of text retrieval it is outperformed not only by Principal Component Analysis (PCA) and Independent Component Analysis (ICA) but also by a simple noise reduction algorithm.

Original languageEnglish
Title of host publicationProceedings - ICMLA 2005
Subtitle of host publicationFourth International Conference on Machine Learning and Applications
Pages293-298
Number of pages6
DOIs
Publication statusPublished - 2005
Externally publishedYes
EventICMLA 2005: 4th International Conference on Machine Learning and Applications - Los Angeles, CA, United States
Duration: 15 Dec 200517 Dec 2005

Publication series

NameProceedings - ICMLA 2005: Fourth International Conference on Machine Learning and Applications
Volume2005

Conference

ConferenceICMLA 2005: 4th International Conference on Machine Learning and Applications
Country/TerritoryUnited States
CityLos Angeles, CA
Period15/12/0517/12/05

Fingerprint

Dive into the research topics of 'A comparison of dimensionality reduction techniques for text retrieval'. Together they form a unique fingerprint.

Cite this