A Hybrid Transformer Framework for Efficient Activity Recognition Using Consumer Electronics

Altaf Hussain, Samee Ullah Khan, Noman Khan, Mohammed Wasim Bhatt, Ahmed Farouk, Jyoti Bhola, Sung Wook Baik*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

8 Citations (Scopus)

Abstract

In the field of research on wireless visual sensor networks, human activity recognition (HAR) using consumer electronics is now an emerging research area in both the academic and industrial sectors, with a diverse range of applications. However, the implementation of HAR through computer vision methods is highly challenging on consumer electronic devices, due to their limited computational capabilities. This means that mainstream approaches in which computationally complex contextual networks and variants of recurrent neural networks are used to learn long-range spatiotemporal dependencies have achieved limited performance. To address these challenges, this paper presents an efficient framework for robust HAR for consumer electronics devices, which is divided into two main stages. In the first stage, convolutional features from the multiply-17 layer of a lightweight MobileNetV3 are employed to balance the computational complexity and extract the most salient contextual features (7× 7× 576× 30) from each video. In the second stage, a sequential residual transformer network (SRTN) is designed in a residual fashion to effectively learn the long-range temporal dependencies across multiple video frames. The temporal multi-head self-attention module and residual strategy of the SRTN enable the proposed method to discard non-relevant features and to optimise the spatiotemporal feature vector for efficient HAR. The performance of the proposed model is evaluated on three challenging HAR datasets, and is found to yield high levels of accuracy of 76.1428%, 96.6399%, and 97.3130% on the HMDB51, UCF101, and UCF50 datasets, respectively, outperforming a state-of-the-art method for HAR.

Original languageEnglish
Pages (from-to)6800-6807
Number of pages8
JournalIEEE Transactions on Consumer Electronics
Volume70
Issue number4
DOIs
Publication statusPublished - 2024
Externally publishedYes

Keywords

  • Human action recognition
  • consumer electronics
  • surveillance system
  • transformer network
  • video classification
  • wireless visual sensor networks

Fingerprint

Dive into the research topics of 'A Hybrid Transformer Framework for Efficient Activity Recognition Using Consumer Electronics'. Together they form a unique fingerprint.

Cite this