TY - GEN
T1 - A comparison of dimensionality reduction techniques for text retrieval
AU - Vinay, Vishwa
AU - Cox, Ingemar J.
AU - Wood, Ken
AU - Milic-Frayling, Natasa
PY - 2005
Y1 - 2005
N2 - The growth of digital information increases the need to build better techniques for automatically storing, organizing and retrieving it. Much of this information is textual in nature and existing representation models struggle to deal with the high dimensionality of the resulting feature space. Techniques like Latent Semantic Indexing address, to some degree, the problem of high dimensionality in information retrieval. However, promising alternatives, like Random Mapping (RM), have yet to be completely studied in this context. In this paper, we show that despite the attention RM has received in other applications, in the case of text retrieval it is outperformed not only by Principal Component Analysis (PCA) and Independent Component Analysis (ICA) but also by a simple noise reduction algorithm.
AB - The growth of digital information increases the need to build better techniques for automatically storing, organizing and retrieving it. Much of this information is textual in nature and existing representation models struggle to deal with the high dimensionality of the resulting feature space. Techniques like Latent Semantic Indexing address, to some degree, the problem of high dimensionality in information retrieval. However, promising alternatives, like Random Mapping (RM), have yet to be completely studied in this context. In this paper, we show that despite the attention RM has received in other applications, in the case of text retrieval it is outperformed not only by Principal Component Analysis (PCA) and Independent Component Analysis (ICA) but also by a simple noise reduction algorithm.
UR - http://www.scopus.com/inward/record.url?scp=33847287398&partnerID=8YFLogxK
U2 - 10.1109/ICMLA.2005.2
DO - 10.1109/ICMLA.2005.2
M3 - Conference contribution
AN - SCOPUS:33847287398
SN - 0769524958
SN - 9780769524955
T3 - Proceedings - ICMLA 2005: Fourth International Conference on Machine Learning and Applications
SP - 293
EP - 298
BT - Proceedings - ICMLA 2005
T2 - ICMLA 2005: 4th International Conference on Machine Learning and Applications
Y2 - 15 December 2005 through 17 December 2005
ER -