TY - GEN
T1 - Speaker identification system under noisy conditions
AU - Alam, Md Shariful
AU - Zilany, Muhammad S.A.
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/9
Y1 - 2019/9
N2 - Speaker identification (SID) systems need to be robust to extrinsic variations in the speech signal, such as background noise, to be applicable in many real-life scenarios. Mel-frequency cepstral coefficient (MFCC)-based i-vector systems have been defined as the state-of-the-art technique for speaker identification, but it is well-known that the performance of traditional methods, in which features are mostly extracted from the properties of the acoustic signal, degrades substantially under noisy conditions. This study proposes a robust SID system using the neural responses of a physiologically-based computational model of the auditory periphery. The 2-D neurograms were constructed from the simulated responses of the auditory-nerve fibers to speech signals from the TIMIT database. The neurogram coefficients were trained using the i-vector based systems to generate an identity model for each speaker, and performances were evaluated and compared in quiet and under noisy conditions with the results from existing methods such as the MFCC, frequency-domain linear prediction (FDLP) and Gammatone frequency cepstral coefficient (GFCC). Results showed that the proposed system outperformed all existing acoustic-signal-based methods for both in quiet and under noisy conditions.
AB - Speaker identification (SID) systems need to be robust to extrinsic variations in the speech signal, such as background noise, to be applicable in many real-life scenarios. Mel-frequency cepstral coefficient (MFCC)-based i-vector systems have been defined as the state-of-the-art technique for speaker identification, but it is well-known that the performance of traditional methods, in which features are mostly extracted from the properties of the acoustic signal, degrades substantially under noisy conditions. This study proposes a robust SID system using the neural responses of a physiologically-based computational model of the auditory periphery. The 2-D neurograms were constructed from the simulated responses of the auditory-nerve fibers to speech signals from the TIMIT database. The neurogram coefficients were trained using the i-vector based systems to generate an identity model for each speaker, and performances were evaluated and compared in quiet and under noisy conditions with the results from existing methods such as the MFCC, frequency-domain linear prediction (FDLP) and Gammatone frequency cepstral coefficient (GFCC). Results showed that the proposed system outperformed all existing acoustic-signal-based methods for both in quiet and under noisy conditions.
KW - AN model
KW - I-vector
KW - Neurogram
KW - Noisy conditions
KW - Speaker identification systems
UR - http://www.scopus.com/inward/record.url?scp=85079355765&partnerID=8YFLogxK
U2 - 10.1109/ICAEE48663.2019.8975420
DO - 10.1109/ICAEE48663.2019.8975420
M3 - Conference contribution
AN - SCOPUS:85079355765
T3 - 2019 5th International Conference on Advances in Electrical Engineering, ICAEE 2019
SP - 566
EP - 569
BT - 2019 5th International Conference on Advances in Electrical Engineering, ICAEE 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 5th International Conference on Advances in Electrical Engineering, ICAEE 2019
Y2 - 26 September 2019 through 28 September 2019
ER -