Automatic classification of speech overlaps: Feature representation and algorithms

Shammur Absar Chowdhury*, Evgeny A. Stepanov, Morena Danieli, Giuseppe Riccardi

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

13 Citations (Scopus)

Abstract

Overlapping speech is a natural and frequently occurring phenomenon in human–human conversations with an underlying purpose. Speech overlap events may be categorized as competitive and non-competitive. While the former is an attempt to grab the floor, the latter is an attempt to assist the speaker to continue the turn. The presence and distribution of these categories are indicative of the speakers’ states during the conversation. Therefore, understanding these manifestations is crucial for conversational analysis and for modeling human–machine dialogs. The goal of this study is to design computational models to classify overlapping speech segments of dyadic conversations into competitive vs. non-competitive acts using lexical and acoustic cues, as well as their surrounding context. The designed overlap representations are evaluated in both linear – Support Vector Machines (SVM) – and non-linear – feed-forward (FFNN), convolutional (CNN) and long short-term memory (LSTM) neural network – models. We experiment with lexical and acoustic representations and their combinations from both speaker channels in feature and hidden space. We observe that lexical word-embedding features significantly increase the overall F1-measure compared to both acoustic and bag-of-ngrams lexical representations, suggesting that lexical information can be utilized as a powerful cue for overlap classification. Our comparative study shows that the best computational architecture is an FFNN along with a combination of word embeddings and acoustic features.

Original languageEnglish
Pages (from-to)145-167
Number of pages23
JournalComputer Speech and Language
Volume55
DOIs
Publication statusPublished - May 2019
Externally publishedYes

Keywords

  • Acoustic
  • Deep learning
  • Lexical
  • Overlap
  • Spoken conversation

Fingerprint

Dive into the research topics of 'Automatic classification of speech overlaps: Feature representation and algorithms'. Together they form a unique fingerprint.

Cite this