TY - GEN
T1 - DDD
T2 - 2024 Eurographics Italian Chapter Conference on Smart Tools and Applications in Graphics, STAG 2024
AU - Pintore, Giovanni
AU - Agus, Marco
AU - Signoroni, Alberto
AU - Gobbetti, Enrico
N1 - Publisher Copyright:
© 2024 The Authors.
PY - 2024
Y1 - 2024
N2 - We introduce a novel deep neural network for rapid and structurally consistent monocular 360° depth estimation in indoor environments. The network infers a depth map from a single gravity-aligned or gravity-rectified equirectangular image of the environment, ensuring that the predicted depth aligns with the typical depth distribution and features of cluttered interior spaces, which are usually enclosed by walls, ceilings, and floors. By leveraging the distinct characteristics of vertical and horizontal features in man-made indoor environments, we introduce a lean network architecture that employs gravity-aligned feature flattening and specialized vision transformers that utilize the input's omnidirectional nature, without segmentation into patches and positional encoding. To enhance the structural consistency of the predicted depth, we introduce a new loss function that evaluates the consistency of density maps by projecting points derived from the inferred depth map onto horizontal and vertical planes. This lightweight architecture has very small computational demands, provides greater structural consistency than competing methods, and does not require the explicit imposition of strong structural priors.
AB - We introduce a novel deep neural network for rapid and structurally consistent monocular 360° depth estimation in indoor environments. The network infers a depth map from a single gravity-aligned or gravity-rectified equirectangular image of the environment, ensuring that the predicted depth aligns with the typical depth distribution and features of cluttered interior spaces, which are usually enclosed by walls, ceilings, and floors. By leveraging the distinct characteristics of vertical and horizontal features in man-made indoor environments, we introduce a lean network architecture that employs gravity-aligned feature flattening and specialized vision transformers that utilize the input's omnidirectional nature, without segmentation into patches and positional encoding. To enhance the structural consistency of the predicted depth, we introduce a new loss function that evaluates the consistency of density maps by projecting points derived from the inferred depth map onto horizontal and vertical planes. This lightweight architecture has very small computational demands, provides greater structural consistency than competing methods, and does not require the explicit imposition of strong structural priors.
UR - http://www.scopus.com/inward/record.url?scp=85216187769&partnerID=8YFLogxK
U2 - 10.2312/stag.20241336
DO - 10.2312/stag.20241336
M3 - Conference contribution
AN - SCOPUS:85216187769
T3 - Eurographics Italian Chapter Proceedings - Smart Tools and Applications in Graphics, STAG
BT - Smart Tools and Applications in Graphics - Eurographics Italian Chapter Conference, STAG 2024
A2 - Fellner, Dieter
PB - Eurographics Association
Y2 - 14 November 2024 through 15 November 2024
ER -