TY - GEN
T1 - Video Classification Based on Spatial Gradient and Optical Flow Descriptors
AU - Tang, Xiaolin
AU - Bouzerdoum, Abdesselam
AU - Phung, Son Lam
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015
Y1 - 2015
N2 - Feature point detection and local feature extraction are the two critical steps in trajectory-based methods for video classification. This paper proposes to detect trajectories by tracking the spatiotemporal feature points in salient regions instead of the entire frame. This strategy significantly reduces noisy feature points in the background region, and leads to lower computational cost and higher discriminative power of the feature set. Two new spatiotemporal descriptors, namely the STOH and RISTOH are proposed to describe the spatiotemporal characteristics of the moving object. The proposed method for feature point detection and local feature extraction is applied for human action recognition. It is evaluated on three video datasets: KTH, YouTube, and Hollywood2. The results show that the proposed method achieves a higher classification rate, even when it uses only half the number of feature points compared to the dense sampling approach. Moreover, features extracted from the curvature of the motion surface are more discriminative than features extracted from the spatial gradient.
AB - Feature point detection and local feature extraction are the two critical steps in trajectory-based methods for video classification. This paper proposes to detect trajectories by tracking the spatiotemporal feature points in salient regions instead of the entire frame. This strategy significantly reduces noisy feature points in the background region, and leads to lower computational cost and higher discriminative power of the feature set. Two new spatiotemporal descriptors, namely the STOH and RISTOH are proposed to describe the spatiotemporal characteristics of the moving object. The proposed method for feature point detection and local feature extraction is applied for human action recognition. It is evaluated on three video datasets: KTH, YouTube, and Hollywood2. The results show that the proposed method achieves a higher classification rate, even when it uses only half the number of feature points compared to the dense sampling approach. Moreover, features extracted from the curvature of the motion surface are more discriminative than features extracted from the spatial gradient.
UR - http://www.scopus.com/inward/record.url?scp=84963642414&partnerID=8YFLogxK
U2 - 10.1109/DICTA.2015.7371319
DO - 10.1109/DICTA.2015.7371319
M3 - Conference contribution
AN - SCOPUS:84963642414
T3 - 2015 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2015
BT - 2015 International Conference on Digital Image Computing
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - International Conference on Digital Image Computing: Techniques and Applications, DICTA 2015
Y2 - 23 November 2015 through 25 November 2015
ER -