Optimizing the cost of information retrieval test collections

Mehdi Hosseini*, Ingemar Cox, Natasa Milic-Frayling

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Citations (Scopus)

Abstract

We consider the problem of optimally allocating limited resources to construct relevance judgements for a test collection that facilities reliable evaluation of retrieval systems. We assume that there is a large set of test queries, for each of which a large number of documents need to be judged though the available budget only permits to judge a subset of them. A candidate solution to this problem has to deal with, at least, three challenges. (i) Given a fixed budget it has to efficiently select a subset of query-documents pairs for acquiring relevance judgements. (ii) With collected relevance judgements it has to be able to not only accurately evaluate a set of systems participating in a test collection construction but also reliably assess the performance of new as yet unseen systems. (iii) Finally, it has to properly deal with uncertainty that is due to (a) the presence of unjudged documents in a rank list, (b) the presence of queries with no relevance judgements, and (c) errors caused by human assessors when labelling documents. In this thesis we propose an optimisation framework that accommodates appropriate solutions for each of the three challenges. Our approach is aimed to be of benefit to construct IR test collections by research institutes, e.g. NIST, or commercial search engines, e.g. Google and Bing, where there are large scale documents collections and loads of query logs however economic constraints prohibit gathering comprehensive relevance judgements.

Original languageEnglish
Title of host publicationCIKM 2011 Glasgow
Subtitle of host publicationPIKM'11 - Proceedings of the 2011 Workshop for Ph.D. Students in Information and Knowledge Management
Pages79-82
Number of pages4
DOIs
Publication statusPublished - 2011
Externally publishedYes
Event4th Workshop for Ph.D. Students in Information and Knowledge Management, PIKM'11 - Glasgow, United Kingdom
Duration: 28 Oct 201128 Oct 2011

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Conference

Conference4th Workshop for Ph.D. Students in Information and Knowledge Management, PIKM'11
Country/TerritoryUnited Kingdom
CityGlasgow
Period28/10/1128/10/11

Keywords

  • convex optimisation
  • evaluation
  • resource allocation
  • test collection

Fingerprint

Dive into the research topics of 'Optimizing the cost of information retrieval test collections'. Together they form a unique fingerprint.

Cite this