Actor-Aware Self-Supervised Learning for Semi-Supervised Video Representation Learning

Maregu Assefa, Wei Jiang*, Kumie Gedamu Alemu, Getinet Yilma, Deepak Adhikari, Melese Ayalew, Abegaz Mohammed Seid, Aiman Erbad

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)

Abstract

Self-supervised contrastive learning has shown a significant improvement in performance for action recognition tasks by discovering useful signals from unlabeled videos. Nevertheless, the unique features of existing video benchmark datasets have led the learned video representations to be contextually biased toward dominant backgrounds and scene correlations. Thus, ultimately leading to poor generalizations on scene-invariant action recognition. Therefore, we propose Actor-aware Self-supervised Learning for Semi-supervised Video Representation Learning (ActorSL). We aligned localized actors and their corresponding scene information to encourage the model to learn discriminative regions and mitigate the model's dependency on the video background during contrastive training. Furthermore, we present an inter-video Background Mixing (iBM) augmentation strategy to introduce scene consistency into the model. We patch inter-video crops of four randomly selected frames for iBM to create a unique frame for each video. The patched frame is blended with the target video frames to generate a spatially augmented sample. Then, the actor-scene aligned features and features of iBM-augmented videos are utilized to optimize contrastive loss and consistency regularization jointly in a semi-supervised way. Moreover, iBM combines the one-hot-encoded labels of patches with the label of the target video as a label smoothing regularizer to soften the decision boundaries of the semi-supervised model. Our experimental results reveal that, ActorSL notably improved current state-of-the-art semi-supervised methods on the Kinetics-400, UCF101, and HMDB51 datasets under a low-label regime.
Original languageEnglish
Pages (from-to)6679-6692
Number of pages14
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume33
Issue number11
DOIs
Publication statusPublished - 1 Nov 2023

Keywords

  • Action recognition
  • Actor-aware pseudo-labeling
  • Contrastive learning
  • Inter-video background mixing
  • Semi-supervised learning

Fingerprint

Dive into the research topics of 'Actor-Aware Self-Supervised Learning for Semi-Supervised Video Representation Learning'. Together they form a unique fingerprint.

Cite this