TY - GEN
T1 - On ranking the effectiveness of searches
AU - Vinay, Vishwa
AU - Cox, Ingemar J.
AU - Milic-Frayling, Natasa
AU - Wood, Ken
PY - 2006
Y1 - 2006
N2 - There is a growing interest in estimating the effectiveness of search. Two approaches are typically considered: examining the search queries and examining the retrieved document sets. In this paper, we take the latter approach. We use four measures to characterize the retrieved document sets and estimate the quality of search. These measures are (i) the clustering tendency as measured by the Cox-Lewis statistic, (ii) the sensitivity to document perturbation, (iii) the sensitivity to query perturbation and (iv) the local intrinsic dimensionality. We present experimental results for the task of ranking 200 queries according to the search effectiveness over the TREC (discs 4 and 5) dataset. Our ranking of queries is compared with the ranking based on the average precision using the Kendall τ statistic. The best individual estimator is the sensitivity to document perturbation and yields Kendall τ of 0.521. When combined with the clustering tendency based on the Cox-Lewis statistic and the query perturbation measure, it results in Kendall τ of 0.562 which to our knowledge is the highest correlation with the average precision reported to date.
AB - There is a growing interest in estimating the effectiveness of search. Two approaches are typically considered: examining the search queries and examining the retrieved document sets. In this paper, we take the latter approach. We use four measures to characterize the retrieved document sets and estimate the quality of search. These measures are (i) the clustering tendency as measured by the Cox-Lewis statistic, (ii) the sensitivity to document perturbation, (iii) the sensitivity to query perturbation and (iv) the local intrinsic dimensionality. We present experimental results for the task of ranking 200 queries according to the search effectiveness over the TREC (discs 4 and 5) dataset. Our ranking of queries is compared with the ranking based on the average precision using the Kendall τ statistic. The best individual estimator is the sensitivity to document perturbation and yields Kendall τ of 0.521. When combined with the clustering tendency based on the Cox-Lewis statistic and the query perturbation measure, it results in Kendall τ of 0.562 which to our knowledge is the highest correlation with the average precision reported to date.
KW - Query performance prediction
UR - http://www.scopus.com/inward/record.url?scp=33750306680&partnerID=8YFLogxK
U2 - 10.1145/1148170.1148239
DO - 10.1145/1148170.1148239
M3 - Conference contribution
AN - SCOPUS:33750306680
SN - 1595933697
SN - 9781595933690
T3 - Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
SP - 398
EP - 404
BT - Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
PB - Association for Computing Machinery (ACM)
T2 - 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
Y2 - 6 August 2006 through 11 August 2006
ER -