TY - GEN
T1 - Part-Based Feature Aggregation Method for Dynamic Scene Recognition
AU - Peng, Xiaoming
AU - Bouzerdoum, Abdesselam
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/12
Y1 - 2019/12
N2 - Existing methods for dynamic scene recognition mostly use global features extracted from the entire video frame or a video segment. In this paper, a part-based method is proposed for aggregating local features from multiple video frames. A pre-trained Fast R-CNN model is used to extract local convolutional layer features from the regions of interest (ROIs) of training images. These features are then clustered to locate representative parts. A set cover problem is formulated to select the discriminative parts, which are further refined by fine-tuning the Fast R-CNN. Local convolutional layer features and fully-connected layer features are extracted using the fine-tuned Fast R-CNN model, and then aggregated separately from a video segment to form two feature representations. They are concatenated into a global feature representation. Experimental results show that the proposed method outperforms several state-of-the-art features on two dynamic scene datasets.
AB - Existing methods for dynamic scene recognition mostly use global features extracted from the entire video frame or a video segment. In this paper, a part-based method is proposed for aggregating local features from multiple video frames. A pre-trained Fast R-CNN model is used to extract local convolutional layer features from the regions of interest (ROIs) of training images. These features are then clustered to locate representative parts. A set cover problem is formulated to select the discriminative parts, which are further refined by fine-tuning the Fast R-CNN. Local convolutional layer features and fully-connected layer features are extracted using the fine-tuned Fast R-CNN model, and then aggregated separately from a video segment to form two feature representations. They are concatenated into a global feature representation. Experimental results show that the proposed method outperforms several state-of-the-art features on two dynamic scene datasets.
KW - deep neural networks
KW - dynamic scene recognition
KW - feature aggregation
KW - video classification
UR - http://www.scopus.com/inward/record.url?scp=85078699046&partnerID=8YFLogxK
U2 - 10.1109/DICTA47822.2019.8946036
DO - 10.1109/DICTA47822.2019.8946036
M3 - Conference contribution
AN - SCOPUS:85078699046
T3 - 2019 Digital Image Computing: Techniques and Applications, DICTA 2019
BT - 2019 Digital Image Computing
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 International Conference on Digital Image Computing: Techniques and Applications, DICTA 2019
Y2 - 2 December 2019 through 4 December 2019
ER -