TY - JOUR
T1 - Radon transform of auditory neurograms
T2 - A robust feature set for phoneme classification
AU - Alam, Md Shariful
AU - Jassim, Wissam A.
AU - Zilany, Muhammad S.A.
N1 - Publisher Copyright:
© The Institution of Engineering and Technology 2017.
PY - 2018/5/1
Y1 - 2018/5/1
N2 - Classification of speech phonemes is challenging, especially under noisy environments, and hence traditional speech recognition systems do not perform well in the presence of noise. Unlike traditional methods in which features are mostly extracted from the properties of the acoustic signal, this study proposes a new feature for phoneme classification using neural responses from a physiologically based computational model of the auditory periphery. The two-dimensional neurogram was constructed from the simulated responses of auditory-nerve fibres to speech phonemes. Features of neurogram images were extracted using the Discrete Radon Transform, and the dimensionality of features was reduced using an efficient feature selection technique. A standard classifier, Support Vector Machine, was employed to model and test the phoneme classes. Classification performance was evaluated in quiet and under noisy conditions in which test data were corrupted with various environmental distortions such as additive noise, room reverberation, and telephone-channel noise. Performances were also compared with the results from existing methods such as the Mel-frequency cepstral coefficient, Gammatone frequency cepstral coefficient, and frequency-domain linear prediction-based phoneme classification methods. In general, the proposed neural feature exhibited a better classification accuracy in quiet and under noisy conditions compared with the performance of most existing acoustic-signal-based methods.
AB - Classification of speech phonemes is challenging, especially under noisy environments, and hence traditional speech recognition systems do not perform well in the presence of noise. Unlike traditional methods in which features are mostly extracted from the properties of the acoustic signal, this study proposes a new feature for phoneme classification using neural responses from a physiologically based computational model of the auditory periphery. The two-dimensional neurogram was constructed from the simulated responses of auditory-nerve fibres to speech phonemes. Features of neurogram images were extracted using the Discrete Radon Transform, and the dimensionality of features was reduced using an efficient feature selection technique. A standard classifier, Support Vector Machine, was employed to model and test the phoneme classes. Classification performance was evaluated in quiet and under noisy conditions in which test data were corrupted with various environmental distortions such as additive noise, room reverberation, and telephone-channel noise. Performances were also compared with the results from existing methods such as the Mel-frequency cepstral coefficient, Gammatone frequency cepstral coefficient, and frequency-domain linear prediction-based phoneme classification methods. In general, the proposed neural feature exhibited a better classification accuracy in quiet and under noisy conditions compared with the performance of most existing acoustic-signal-based methods.
UR - http://www.scopus.com/inward/record.url?scp=85046123805&partnerID=8YFLogxK
U2 - 10.1049/iet-spr.2017.0170
DO - 10.1049/iet-spr.2017.0170
M3 - Article
AN - SCOPUS:85046123805
SN - 1751-9675
VL - 12
SP - 260
EP - 268
JO - IET Signal Processing
JF - IET Signal Processing
IS - 3
ER -