TY - GEN
T1 - CLASP
T2 - 47th European Conference on Information Retrieval, ECIR 2025
AU - Abootorabi, Mohammad Mahdi
AU - Asgari, Ehsaneddin
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025/4/3
Y1 - 2025/4/3
N2 - This study introduces CLASP (Contrastive Language-Speech Pretraining), a multilingual, multimodal representation tailored for audio-text information retrieval. CLASP leverages the synergy between spoken content and textual data. During training, we utilize our newly introduced speech-text dataset, which encompasses 15 diverse categories ranging from fiction to religion. CLASP’s audio component integrates audio spectrograms with a pre-trained self-supervised speech model, while its language encoding counterpart employs a sentence encoder pre-trained on over 100 languages. This unified lightweight model bridges the gap between various modalities and languages, enhancing its effectiveness in handling and retrieving multilingual and multimodal data. Our evaluations across multiple languages demonstrate that CLASP establishes new benchmarks in HITS@1, MRR, and meanR metrics, outperforming traditional ASR-based retrieval methods that rely on transcribing speech into text for subsequent text retrieval, especially in specific scenarios.
AB - This study introduces CLASP (Contrastive Language-Speech Pretraining), a multilingual, multimodal representation tailored for audio-text information retrieval. CLASP leverages the synergy between spoken content and textual data. During training, we utilize our newly introduced speech-text dataset, which encompasses 15 diverse categories ranging from fiction to religion. CLASP’s audio component integrates audio spectrograms with a pre-trained self-supervised speech model, while its language encoding counterpart employs a sentence encoder pre-trained on over 100 languages. This unified lightweight model bridges the gap between various modalities and languages, enhancing its effectiveness in handling and retrieving multilingual and multimodal data. Our evaluations across multiple languages demonstrate that CLASP establishes new benchmarks in HITS@1, MRR, and meanR metrics, outperforming traditional ASR-based retrieval methods that rely on transcribing speech into text for subsequent text retrieval, especially in specific scenarios.
KW - Contrastive Learning
KW - Multimodal IR
KW - Speech Retrieval
UR - http://www.scopus.com/inward/record.url?scp=105006508338&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-88717-8_2
DO - 10.1007/978-3-031-88717-8_2
M3 - Conference contribution
AN - SCOPUS:105006508338
SN - 9783031887161
T3 - Lecture Notes in Computer Science
SP - 10
EP - 20
BT - Advances in Information Retrieval - 47th European Conference on Information Retrieval, ECIR 2025, Proceedings
A2 - Hauff, Claudia
A2 - Macdonald, Craig
A2 - Jannach, Dietmar
A2 - Kazai, Gabriella
A2 - Nardini, Franco Maria
A2 - Pinelli, Fabio
A2 - Silvestri, Fabrizio
A2 - Tonellotto, Nicola
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 6 April 2025 through 10 April 2025
ER -