TY - GEN
T1 - Estimating retrieval effectiveness using rank distributions
AU - Vinay, Vishwa
AU - Milic-Frayling, Natasa
AU - Cox, Ingemar
PY - 2008
Y1 - 2008
N2 - In this paper, we consider the task of estimating query effectiveness, i.e., assessment of the retrieval system performance in absence of user relevance judgments. In our approach we model the score associated with each document in the result set as a Gaussian random variable. The mean and the variance of each document score can then be used to estimate the probability that a document will be ranked above another one and thus calculate the expected rank of the document in the ranked list. We propose to measure the effectiveness of the system performance by comparing the predicted and actual ranks of the retrieved documents. In our experiments we consider two retrieval models and five document scoring methods and evaluate their impact on the proposed estimation measures. Our experiments with standardized data sets that include document relevance judgments and the task of predicting the relative query effectiveness show that the expected rank metric is robust to variations in document scoring and retrieval algorithms.
AB - In this paper, we consider the task of estimating query effectiveness, i.e., assessment of the retrieval system performance in absence of user relevance judgments. In our approach we model the score associated with each document in the result set as a Gaussian random variable. The mean and the variance of each document score can then be used to estimate the probability that a document will be ranked above another one and thus calculate the expected rank of the document in the ranked list. We propose to measure the effectiveness of the system performance by comparing the predicted and actual ranks of the retrieved documents. In our experiments we consider two retrieval models and five document scoring methods and evaluate their impact on the proposed estimation measures. Our experiments with standardized data sets that include document relevance judgments and the task of predicting the relative query effectiveness show that the expected rank metric is robust to variations in document scoring and retrieval algorithms.
KW - Experimentation
KW - Measurement
KW - Reliability
UR - http://www.scopus.com/inward/record.url?scp=70349243699&partnerID=8YFLogxK
U2 - 10.1145/1458082.1458314
DO - 10.1145/1458082.1458314
M3 - Conference contribution
AN - SCOPUS:70349243699
SN - 9781595939913
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 1425
EP - 1426
BT - Proceedings of the 17th ACM Conference on Information and Knowledge Management, CIKM'08
T2 - 17th ACM Conference on Information and Knowledge Management, CIKM'08
Y2 - 26 October 2008 through 30 October 2008
ER -