TY - GEN
T1 - Combined gesture-speech analysis and speech driven gesture synthesis
AU - Sargin, M. E.
AU - Aran, O.
AU - Karpov, A.
AU - Ofli, F.
AU - Yasinnik, Y.
AU - Wilson, S.
AU - Erzin, E.
AU - Yemez, Y.
AU - Tekalp, A. M.
PY - 2006
Y1 - 2006
N2 - Multimodal speech and speaker modeling and recognition are widely accepted as vital aspects of state of the art human-machine interaction systems. While correlations between speech and lip motion as well as speech and facial expressions are widely studied, relatively little work has been done to investigate the correlations between speech and gesture. Detection and modeling of head, hand and arm gestures of a speaker have been studied extensively and these gestures were shown to carry linguistic information. A typical example is the head gesture while saying "yes/no". In this study, correlation between gestures and speech is investigated. In speech signal analysis, keyword spotting and prosodic accent event detection has been performed. In gesture analysis, hand positions and parameters of global head motion are used as features. The detection of gestures is based on discrete predesignated symbol sets, which are manually labeled during the training phase. The gesture-speech correlation is modelled by examining the co-occurring speech and gesture patterns. This correlation can be used to fuse gesture and speech modalities for edutainment applications (i.e. video games, 3-D animations) where natural gestures of talking avatars is animated from speech. A speech driven gesture animation example has been implemented for demonstration.
AB - Multimodal speech and speaker modeling and recognition are widely accepted as vital aspects of state of the art human-machine interaction systems. While correlations between speech and lip motion as well as speech and facial expressions are widely studied, relatively little work has been done to investigate the correlations between speech and gesture. Detection and modeling of head, hand and arm gestures of a speaker have been studied extensively and these gestures were shown to carry linguistic information. A typical example is the head gesture while saying "yes/no". In this study, correlation between gestures and speech is investigated. In speech signal analysis, keyword spotting and prosodic accent event detection has been performed. In gesture analysis, hand positions and parameters of global head motion are used as features. The detection of gestures is based on discrete predesignated symbol sets, which are manually labeled during the training phase. The gesture-speech correlation is modelled by examining the co-occurring speech and gesture patterns. This correlation can be used to fuse gesture and speech modalities for edutainment applications (i.e. video games, 3-D animations) where natural gestures of talking avatars is animated from speech. A speech driven gesture animation example has been implemented for demonstration.
UR - http://www.scopus.com/inward/record.url?scp=34247646607&partnerID=8YFLogxK
U2 - 10.1109/ICME.2006.262663
DO - 10.1109/ICME.2006.262663
M3 - Conference contribution
AN - SCOPUS:34247646607
SN - 1424403677
SN - 9781424403677
T3 - 2006 IEEE International Conference on Multimedia and Expo, ICME 2006 - Proceedings
SP - 893
EP - 896
BT - 2006 IEEE International Conference on Multimedia and Expo, ICME 2006 - Proceedings
T2 - 2006 IEEE International Conference on Multimedia and Expo, ICME 2006
Y2 - 9 July 2006 through 12 July 2006
ER -