TY - GEN
T1 - Ordinal depth classification using region-based self-attention
AU - Phan, Minh Hieu
AU - Phung, Son Lam
AU - Bouzerdoum, Abdesselam
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2020
Y1 - 2020
N2 - Depth perception is essential for scene understanding, autonomous navigation and augmented reality. Depth estimation from a single 2D image is challenging due to the lack of reliable cues, e.g. stereo correspondences and motions. Modern approaches exploit multi-scale feature extraction to provide more powerful representations for deep networks. However, these studies only use simple addition or concatenation to combine the extracted multi-scale features. This paper proposes a novel region-based self-attention (rSA) unit for effective feature fusions. The rSA recalibrates the multi-scale responses by explicitly modelling the dependency between channels in separate image regions. We discretize continuous depths to formulate an ordinal depth classification problem in which the relative order between categories is preserved. The experiments are performed on a dataset of 4410 RGB-D images, captured in outdoor environments at the University of Wollongong's campus. The proposed module improves the models on small-sized datasets by 22% to 40%.
AB - Depth perception is essential for scene understanding, autonomous navigation and augmented reality. Depth estimation from a single 2D image is challenging due to the lack of reliable cues, e.g. stereo correspondences and motions. Modern approaches exploit multi-scale feature extraction to provide more powerful representations for deep networks. However, these studies only use simple addition or concatenation to combine the extracted multi-scale features. This paper proposes a novel region-based self-attention (rSA) unit for effective feature fusions. The rSA recalibrates the multi-scale responses by explicitly modelling the dependency between channels in separate image regions. We discretize continuous depths to formulate an ordinal depth classification problem in which the relative order between categories is preserved. The experiments are performed on a dataset of 4410 RGB-D images, captured in outdoor environments at the University of Wollongong's campus. The proposed module improves the models on small-sized datasets by 22% to 40%.
UR - http://www.scopus.com/inward/record.url?scp=85110439949&partnerID=8YFLogxK
U2 - 10.1109/ICPR48806.2021.9412477
DO - 10.1109/ICPR48806.2021.9412477
M3 - Conference contribution
AN - SCOPUS:85110439949
T3 - Proceedings - International Conference on Pattern Recognition
SP - 3620
EP - 3627
BT - Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 25th International Conference on Pattern Recognition, ICPR 2020
Y2 - 10 January 2021 through 15 January 2021
ER -