Speech Representation Analysis Based on Inter- and Intra-Model Similarities

Yassine El Kheir*, Ahmed Ali, Shammur Absar Chowdhury

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Self-supervised models have revolutionized speech processing, achieving new levels of performance in a wide variety of tasks with limited resources. However, the inner workings of these models are still opaque. In this paper, we aim to analyze the encoded contextual representation of these foundation models based on their inter- and intramodel similarity, independent of any external annotation and task-specific constraint. We examine different SSL models varying their training paradigm - Contrastive (Wav2Vec2.0) and Predictive models (HuBERT); and model sizes (base and large). We explore these models on different levels of localization/distributivity of information including (i) individual neurons; (ii) layer representation; (iii) attention weights and (iv) compare the representations with their finetuned counterparts. Our results highlight that these models converge to similar representation subspaces but not to similar neuronlocalized concepts(1). We made the code publicly available for facilitating further research, we publicly released our code(2).
Original languageEnglish
Title of host publication2024 Ieee International Conference On Acoustics, Speech, And Signal Processing Workshops, Icasspw 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages848-852
Number of pages5
ISBN (Electronic)9798350374513
ISBN (Print)979-8-3503-7452-0
DOIs
Publication statusPublished - 2024
Event49th IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 - Seoul, Korea, Republic of
Duration: 14 Apr 202419 Apr 2024

Publication series

Name2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024 - Proceedings

Conference

Conference49th IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops, ICASSPW 2024
Country/TerritoryKorea, Republic of
CitySeoul
Period14/04/2419/04/24

Keywords

  • Inter- and Intra- Similarities
  • Self-Supervised Learning
  • Speech Models

Fingerprint

Dive into the research topics of 'Speech Representation Analysis Based on Inter- and Intra-Model Similarities'. Together they form a unique fingerprint.

Cite this