TY - JOUR
T1 - A Hybrid Transformer Framework for Efficient Activity Recognition Using Consumer Electronics
AU - Hussain, Altaf
AU - Khan, Samee Ullah
AU - Khan, Noman
AU - Bhatt, Mohammed Wasim
AU - Farouk, Ahmed
AU - Bhola, Jyoti
AU - Baik, Sung Wook
N1 - Publisher Copyright:
© 1975-2011 IEEE.
PY - 2024
Y1 - 2024
N2 - In the field of research on wireless visual sensor networks, human activity recognition (HAR) using consumer electronics is now an emerging research area in both the academic and industrial sectors, with a diverse range of applications. However, the implementation of HAR through computer vision methods is highly challenging on consumer electronic devices, due to their limited computational capabilities. This means that mainstream approaches in which computationally complex contextual networks and variants of recurrent neural networks are used to learn long-range spatiotemporal dependencies have achieved limited performance. To address these challenges, this paper presents an efficient framework for robust HAR for consumer electronics devices, which is divided into two main stages. In the first stage, convolutional features from the multiply-17 layer of a lightweight MobileNetV3 are employed to balance the computational complexity and extract the most salient contextual features (7× 7× 576× 30) from each video. In the second stage, a sequential residual transformer network (SRTN) is designed in a residual fashion to effectively learn the long-range temporal dependencies across multiple video frames. The temporal multi-head self-attention module and residual strategy of the SRTN enable the proposed method to discard non-relevant features and to optimise the spatiotemporal feature vector for efficient HAR. The performance of the proposed model is evaluated on three challenging HAR datasets, and is found to yield high levels of accuracy of 76.1428%, 96.6399%, and 97.3130% on the HMDB51, UCF101, and UCF50 datasets, respectively, outperforming a state-of-the-art method for HAR.
AB - In the field of research on wireless visual sensor networks, human activity recognition (HAR) using consumer electronics is now an emerging research area in both the academic and industrial sectors, with a diverse range of applications. However, the implementation of HAR through computer vision methods is highly challenging on consumer electronic devices, due to their limited computational capabilities. This means that mainstream approaches in which computationally complex contextual networks and variants of recurrent neural networks are used to learn long-range spatiotemporal dependencies have achieved limited performance. To address these challenges, this paper presents an efficient framework for robust HAR for consumer electronics devices, which is divided into two main stages. In the first stage, convolutional features from the multiply-17 layer of a lightweight MobileNetV3 are employed to balance the computational complexity and extract the most salient contextual features (7× 7× 576× 30) from each video. In the second stage, a sequential residual transformer network (SRTN) is designed in a residual fashion to effectively learn the long-range temporal dependencies across multiple video frames. The temporal multi-head self-attention module and residual strategy of the SRTN enable the proposed method to discard non-relevant features and to optimise the spatiotemporal feature vector for efficient HAR. The performance of the proposed model is evaluated on three challenging HAR datasets, and is found to yield high levels of accuracy of 76.1428%, 96.6399%, and 97.3130% on the HMDB51, UCF101, and UCF50 datasets, respectively, outperforming a state-of-the-art method for HAR.
KW - Human action recognition
KW - consumer electronics
KW - surveillance system
KW - transformer network
KW - video classification
KW - wireless visual sensor networks
UR - http://www.scopus.com/inward/record.url?scp=85187339155&partnerID=8YFLogxK
U2 - 10.1109/TCE.2024.3373824
DO - 10.1109/TCE.2024.3373824
M3 - Article
AN - SCOPUS:85187339155
SN - 0098-3063
VL - 70
SP - 6800
EP - 6807
JO - IEEE Transactions on Consumer Electronics
JF - IEEE Transactions on Consumer Electronics
IS - 4
ER -