Things change: Comparing results using historical data and user testing for evaluating a recommendation task

Soon Gyo Jung, Joni Salminen, Shammur A. Chowdhury, Dianne Ramirez Robillos, Bernard J. Jansen

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

We address a recommendation task for next likely flight destination to customers of a major international airline company. We compare performance using historical flight data and an actual user evaluation. Using two years of historical flight data consisting of tens of millions of flights, an ensemble and a collaborative filtering approach obtained an accuracy of 47% and 20% using a test set of 100,000 customers, respectively, highlighting the challenge of the domain. We then evaluated our recommendations on 10,000 actual customers, with a 45-45-10 split among ensemble, collaborative filtering, and control group. The overall predictive power employed with real users was 23%, with the ensemble method having a predictive power of 19% and 30% for collaborative filtering. Results indicate that, in complex and shifting domains such as this one, one cannot rely solely on historical data for evaluating the impact of user recommendations. We discuss implications for recommendation systems and future research in this and related domains.

Original languageEnglish
Title of host publicationCHI EA 2020 - Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450368193
DOIs
Publication statusPublished - 25 Apr 2020
Event2020 ACM CHI Conference on Human Factors in Computing Systems, CHI EA 2020 - Honolulu, United States
Duration: 25 Apr 202030 Apr 2020

Publication series

NameConference on Human Factors in Computing Systems - Proceedings

Conference

Conference2020 ACM CHI Conference on Human Factors in Computing Systems, CHI EA 2020
Country/TerritoryUnited States
CityHonolulu
Period25/04/2030/04/20

Keywords

  • Algorithmic trade-off
  • Prediction
  • Recommendations
  • User study

Fingerprint

Dive into the research topics of 'Things change: Comparing results using historical data and user testing for evaluating a recommendation task'. Together they form a unique fingerprint.

Cite this