TY - GEN
T1 - Selecting a subset of queries for acquisition of further relevance judgements
AU - Hosseini, Mehdi
AU - Cox, Ingemar J.
AU - Milic-Frayling, Natasa
AU - Vinay, Vishwa
AU - Sweeting, Trevor
PY - 2011
Y1 - 2011
N2 - Assessing the relative performance of search systems requires the use of a test collection with a pre-defined set of queries and corresponding relevance assessments. The state-of-the-art process of constructing test collections involves using a large number of queries and selecting a set of documents, submitted by a group of participating systems, to be judged per query. However, the initial set of judgments may be insufficient to reliably evaluate the performance of future as yet unseen systems. In this paper, we propose a method that expands the set of relevance judgments as new systems are being evaluated. We assume that there is a limited budget to build additional relevance judgements. From the documents retrieved by the new systems we create a pool of unjudged documents. Rather than uniformly distributing the budget across all queries, we first select a subset of queries that are effective in evaluating systems and then uniformly allocate the budget only across these queries. Experimental results on TREC 2004 Robust track test collection demonstrate the superiority of this budget allocation strategy.
AB - Assessing the relative performance of search systems requires the use of a test collection with a pre-defined set of queries and corresponding relevance assessments. The state-of-the-art process of constructing test collections involves using a large number of queries and selecting a set of documents, submitted by a group of participating systems, to be judged per query. However, the initial set of judgments may be insufficient to reliably evaluate the performance of future as yet unseen systems. In this paper, we propose a method that expands the set of relevance judgments as new systems are being evaluated. We assume that there is a limited budget to build additional relevance judgements. From the documents retrieved by the new systems we create a pool of unjudged documents. Rather than uniformly distributing the budget across all queries, we first select a subset of queries that are effective in evaluating systems and then uniformly allocate the budget only across these queries. Experimental results on TREC 2004 Robust track test collection demonstrate the superiority of this budget allocation strategy.
UR - http://www.scopus.com/inward/record.url?scp=80052988568&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-23318-0_12
DO - 10.1007/978-3-642-23318-0_12
M3 - Conference contribution
AN - SCOPUS:80052988568
SN - 9783642233173
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 113
EP - 124
BT - Advances in Information Retrieval Theory - Third International Conference, ICTIR 2011, Proceedings
T2 - 3rd International Conference on the Theory of Information Retrieval, ICTIR 2011
Y2 - 12 September 2011 through 14 September 2011
ER -