A part-based spatial and temporal aggregation method for dynamic scene recognition

Xiaoming Peng*, Abdesselam Bouzerdoum, Son Lam Phung

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

Existing methods for dynamic scene recognition mostly use global features extracted from the entire video frame or a video segment. In this paper, a part-based method is proposed to aggregate local features from video frames. A pre-trained Fast R-CNN model is used to extract local convolutional features from the regions of interest of training images. These features are clustered to locate representative parts. A set cover problem is then formulated to select the discriminative parts, which are further refined by fine-tuning the Fast R-CNN model. Local features from a video segment are extracted at different layers of the fine-tuned Fast R-CNN model and aggregated both spatially and temporally. Extensive experimental results show that the proposed method is very competitive with state-of-the-art approaches.

Original languageEnglish
Pages (from-to)7353-7370
Number of pages18
JournalNeural Computing and Applications
Volume33
Issue number13
DOIs
Publication statusPublished - Jul 2021

Keywords

  • Deep neural networks
  • Dynamic scene recognition
  • Feature aggregation
  • Part-based models

Fingerprint

Dive into the research topics of 'A part-based spatial and temporal aggregation method for dynamic scene recognition'. Together they form a unique fingerprint.

Cite this