Exploiting Convolutional Neural Networks for Phonotactic Based Dialect Identification

Maryam Najafian, Sameer Khurana, Suwon Shan, Ahmed Ali, James Glass

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

39 Citations (Scopus)

Abstract

In this paper, we investigate different approaches for Dialect Identification (DID) in Arabic broadcast speech. Dialects differ in their inventory of phonological segments. This paper proposes a new phonotactic based feature representation approach which enables discrimination among different occurrences of the same phone n-grams with different phone duration and probability statistics. To achieve further gain in accuracy we used multi-lingual phone recognizers, trained separately on Arabic, English, Czech, Hungarian and Russian languages. We use Support Vector Machines (SVMs), and Convolutional Neural Networks (CNN s) as backend classifiers throughout the study. The final system fusion results in 24.7% and 19.0% relative error rate reduction compared to that of a conventional phonotactic DID, and i-vectors with bottleneck features.

Original languageEnglish
Title of host publication2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5174-5178
Number of pages5
ISBN (Print)9781538646588
DOIs
Publication statusPublished - 10 Sept 2018
Event2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Calgary, Canada
Duration: 15 Apr 201820 Apr 2018

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2018-April
ISSN (Print)1520-6149

Conference

Conference2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
Country/TerritoryCanada
CityCalgary
Period15/04/1820/04/18

Keywords

  • CNN
  • Dialect identification
  • Phonotactics

Fingerprint

Dive into the research topics of 'Exploiting Convolutional Neural Networks for Phonotactic Based Dialect Identification'. Together they form a unique fingerprint.

Cite this