TY - GEN
T1 - Violent Scene Detection Using a Super Descriptor Tensor Decomposition
AU - Khokher, Muhammad Rizwan
AU - Bouzerdoum, Abdesselam
AU - Phung, Son Lam
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015
Y1 - 2015
N2 - This article presents a new method for violent scene detection using super descriptor tensor decomposition. Multi-modal local features comprising auditory and visual features are extracted from Mel-frequency cepstral coefficients (including first and second order derivatives) and refined dense trajectories. There is usually a large number of dense trajectories extracted from a video sequence; some of these trajectories are unnecessary and can affect the accuracy. We propose to refine the dense trajectories by selecting only discriminative trajectories in the region of interest. Visual descriptors consisting of oriented gradient and motion boundary histograms are computed along the refined dense trajectories. In traditional bag-of-visual-words techniques, the feature descriptors are concatenated to form a single large feature vector for classification. This destroys the spatio-Temporal interactions among features extracted from multi-modal data. To address this problem, a super descriptor tensor decomposition is proposed. The extracted feature descriptors are first encoded using super descriptor vector method. Then the encoded features are arranged as tensors so as to retain the spatio-Temporal structure of the features. To obtain a compact set of features for classification, the TUCKER-3 decomposition is applied to the super descriptor tensors, followed by feature selection using Fisher feature ranking. The obtained features are fed to a support vector machine classifier. Experimental evaluation is performed on violence detection benchmark dataset, MediaEval VSD2014. The proposed method outperforms most of the state-of-The-Art methods, achieving MAP2014 scores of 60.2% and 67.8% on two subsets of the dataset.
AB - This article presents a new method for violent scene detection using super descriptor tensor decomposition. Multi-modal local features comprising auditory and visual features are extracted from Mel-frequency cepstral coefficients (including first and second order derivatives) and refined dense trajectories. There is usually a large number of dense trajectories extracted from a video sequence; some of these trajectories are unnecessary and can affect the accuracy. We propose to refine the dense trajectories by selecting only discriminative trajectories in the region of interest. Visual descriptors consisting of oriented gradient and motion boundary histograms are computed along the refined dense trajectories. In traditional bag-of-visual-words techniques, the feature descriptors are concatenated to form a single large feature vector for classification. This destroys the spatio-Temporal interactions among features extracted from multi-modal data. To address this problem, a super descriptor tensor decomposition is proposed. The extracted feature descriptors are first encoded using super descriptor vector method. Then the encoded features are arranged as tensors so as to retain the spatio-Temporal structure of the features. To obtain a compact set of features for classification, the TUCKER-3 decomposition is applied to the super descriptor tensors, followed by feature selection using Fisher feature ranking. The obtained features are fed to a support vector machine classifier. Experimental evaluation is performed on violence detection benchmark dataset, MediaEval VSD2014. The proposed method outperforms most of the state-of-The-Art methods, achieving MAP2014 scores of 60.2% and 67.8% on two subsets of the dataset.
KW - Refined dense trajectories
KW - Super descriptor vector
KW - Support vector machines
KW - Tensor decomposition
KW - Violent scene detection
UR - http://www.scopus.com/inward/record.url?scp=84963649687&partnerID=8YFLogxK
U2 - 10.1109/DICTA.2015.7371320
DO - 10.1109/DICTA.2015.7371320
M3 - Conference contribution
AN - SCOPUS:84963649687
T3 - 2015 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2015
BT - 2015 International Conference on Digital Image Computing
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - International Conference on Digital Image Computing: Techniques and Applications, DICTA 2015
Y2 - 23 November 2015 through 25 November 2015
ER -