Be a star: Data analytics for soccer careerpath recommendation

  • Yang, David (Principal Investigator)
  • Schneider, Jens (Lead Principal Investigator)
  • Swart-Arries, Kamilla (Principal Investigator)
  • Engineer-1 (Engineer)
  • Engineer-2 (Engineer)
  • Engineer-3 (Engineer)
  • Assistant-2, Research (Research Assistant)
  • Saad, Mr.Adel (Principal Investigator)
  • Ghanem, Prof.Bernard (Principal Investigator)
  • Ali, Yara Zeyad (Research Assistant)

Project: Applied Research

Project Details

Abstract

Entering professional soccer is extremely hard: 200,000,000 aspiring soccer players compete for only 200,000 professional positions world-wide. Many of them pay professional agents to help them identify career opportunities. However, misinformation and similarity bias cause agents and clubs to focus only on a select group of players, leading to missed opportunities and wasted talent. Herein, we propose a data mining + analytics platform recommending important next career opportunities to aspiring players. Designing a recommender system for the sports industry is an extremely difficult, requiring expertise in data science and AI as well as experience in sports management and marketing. As a result, we are not aware of any existing solution. Aspiring players offer only slim chances of transfer revenues to agents yet require a significant investment of time. Thus, there is a striking lack of data on aspiring players. Professional players, in contrast, generate such revenues for agents. Since both are eager to report to the media, such data can be scraped at scale from public domain sources. Yet aspiring players are not only willing to pay money for chances to play in front of teams and coaches, but they are also willing to share their data with agents or social media. Harnessing this to close the gap in data availability creates a win-win situation: we envision a holistic coaching platform for underrepresented, aspiring players. Users subscribing to the platform receive fair and data-driven training and career advice at a far lower cost than with traditional agents. In turn, users share their data (demographics, experience, origin, physical fitness, preferences, …). This data will be used to match aspiring and professional players with similar stats in their early career. The platform then recommends a shortlist of next career steps based on a database containing opportunities such as free training, etc. An important aspect of fair representation is to provide players with an assessment relative to their competition. Valid recommendations could thus be: “bring your physical fitness up to be more competitive” or “Free for all training in Lusail this weekend! You have a realistic chance against the competition; the club is looking for your field position (left wing).” The main technical challenges include: A. Data mining at scale (including player data at every stage of their careers, career opportunities, self-promotion & social media presences of aspiring players) B. Unbiased, quantified assessment of players’ physical fitness at scale. Some of the widely accepted standardized fitness tests can be digitized using either sensors or biomechanically inspired video processing. However, both digitization paths assume the availability of funds and/or infrastructure, scarce resources for most. C. On a technical level, the recommender engine involves either aggregated clustering, link prediction in a heavily attributed dynamic graph, where nodes represent clubs or steppingstones and edges transfers. D. Transformative aspects, e.g., whether sports technology is an apt means to provide lasting secondary value after a primary event such as the 2022 FIFA World Cup ends, the impact of finding previously hidden talent on the sport, and the potential for value creation across different segments of the transfer market (players receive fair representation, agencies semi-automated short-listing tools, and clubs better talent) E. Feasibility and commercial impact evaluations. We address the outlined challenges by three technical work packages addressing B, C, and D, and two industrial/engineering ones addressing A, B. These five work packages are “sandwiched” by two administrative/academic ones to handle project management and the exploitations of results. We will address A using web scrapers and cloud-based data collection services. sKora QSTP-LLC has expertise in this area: they compiled a globally unique database comprising c. 900,000 international transfers. Obtaining physical fitness assessments for B are normally performed by professional coaches or agents, as anecdotal reports by aspiring players are rarely helpful. We propose to address the issue by players filming their exercises using readily available, rudimentarily calibrated smart phones. AI-based video processing then extracts the relevant spatio-temporal regions of a player’s action in the video, followed by fitting a key point model (“stickfigure”). A biomechanical analysis will then compute an impartial, quantitative assessment. The first step of the related work package is mainly concerned with feasibility and robustness, a second step will explore how the analysis can be moved to “the edge” (the player’s smart phone) as there is no real-time requirement. We will also consider smart wearables such as shin protectors [https://soccerment.com], subject to cost and commercial availability. Addressing C, we will first evaluate existing recommendation architectures with respect to scalability and performance. Recent advances in graph convolutional networks developed by PI Dr. Ghanem, so-called deep GCNs are of particular interest. Their application to dynamic graphs modeling, e.g., the fluctuating transfer market is still largely unexplored and will provide a fertile ground for research and IP creation. Treating this very project and the research therein as a case study related to D offers rich opportunities to study the potential of sports analytics and collect data. The aim is to formulate general and exportable strategies that outline the successful transition of an event economy to lasting secondary revenue forms following the World Cup. The project’s industrial partner, sKora QSTP-LLC will be instrumental in driving the industrial evaluation, E, using outcomes of this project, contributing expertise, manpower, access to data, and their network as in-kind contributions.

Submitting Institute Name

Hamad Bin Khalifa University (HBKU)
Sponsor's Award NumberNPRP14S-0404-210137
Proposal IDEX-QNRF-NPRPS-41
StatusActive
Effective start/end date15/11/222/07/25

Collaborative partners

Primary Theme

  • Artificial Intelligence

Primary Subtheme

  • AI - Smart Society

Secondary Theme

  • Artificial Intelligence

Secondary Subtheme

  • AI - Analytics & Decision Support

Keywords

  • Data analytics; Sports analytics; Sports industry;
  • Data Mining;
  • Artificial Intelligence

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.