TY - GEN
T1 - Optimizing the cost of information retrieval test collections
AU - Hosseini, Mehdi
AU - Cox, Ingemar
AU - Milic-Frayling, Natasa
PY - 2011
Y1 - 2011
N2 - We consider the problem of optimally allocating limited resources to construct relevance judgements for a test collection that facilities reliable evaluation of retrieval systems. We assume that there is a large set of test queries, for each of which a large number of documents need to be judged though the available budget only permits to judge a subset of them. A candidate solution to this problem has to deal with, at least, three challenges. (i) Given a fixed budget it has to efficiently select a subset of query-documents pairs for acquiring relevance judgements. (ii) With collected relevance judgements it has to be able to not only accurately evaluate a set of systems participating in a test collection construction but also reliably assess the performance of new as yet unseen systems. (iii) Finally, it has to properly deal with uncertainty that is due to (a) the presence of unjudged documents in a rank list, (b) the presence of queries with no relevance judgements, and (c) errors caused by human assessors when labelling documents. In this thesis we propose an optimisation framework that accommodates appropriate solutions for each of the three challenges. Our approach is aimed to be of benefit to construct IR test collections by research institutes, e.g. NIST, or commercial search engines, e.g. Google and Bing, where there are large scale documents collections and loads of query logs however economic constraints prohibit gathering comprehensive relevance judgements.
AB - We consider the problem of optimally allocating limited resources to construct relevance judgements for a test collection that facilities reliable evaluation of retrieval systems. We assume that there is a large set of test queries, for each of which a large number of documents need to be judged though the available budget only permits to judge a subset of them. A candidate solution to this problem has to deal with, at least, three challenges. (i) Given a fixed budget it has to efficiently select a subset of query-documents pairs for acquiring relevance judgements. (ii) With collected relevance judgements it has to be able to not only accurately evaluate a set of systems participating in a test collection construction but also reliably assess the performance of new as yet unseen systems. (iii) Finally, it has to properly deal with uncertainty that is due to (a) the presence of unjudged documents in a rank list, (b) the presence of queries with no relevance judgements, and (c) errors caused by human assessors when labelling documents. In this thesis we propose an optimisation framework that accommodates appropriate solutions for each of the three challenges. Our approach is aimed to be of benefit to construct IR test collections by research institutes, e.g. NIST, or commercial search engines, e.g. Google and Bing, where there are large scale documents collections and loads of query logs however economic constraints prohibit gathering comprehensive relevance judgements.
KW - convex optimisation
KW - evaluation
KW - resource allocation
KW - test collection
UR - http://www.scopus.com/inward/record.url?scp=83255189451&partnerID=8YFLogxK
U2 - 10.1145/2065003.2065020
DO - 10.1145/2065003.2065020
M3 - Conference contribution
AN - SCOPUS:83255189451
SN - 9781450309530
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 79
EP - 82
BT - CIKM 2011 Glasgow
T2 - 4th Workshop for Ph.D. Students in Information and Knowledge Management, PIKM'11
Y2 - 28 October 2011 through 28 October 2011
ER -