Deconstructing Monocular Depth Reconstruction

Humans rely on stereo vision and motion parallax to estimate depth in their near surroundings. However, these cues become weaker as depth increases. As a result, humans rely profoundly on monocular cues when estimating depth in the far range.

Depth estimation is one of the central challenges of monocular 3D reconstruction. Computer vision algorithms for 3D reconstruction from monocular images have advanced substantially over the past few years. However, depth estimation in the far range still suffers from poor accuracy. This can be partly attributed to the insufficient cues used by current approaches. Moreover, the benchmarking procedure for these algorithms has remained largely unchanged relying on simple metrics and sparse LiDAR data. This prevents insights into the performance of each method, especially where the ground-truth is incorrect.

This tutorial will serve as an introduction to the field of monocular 3D reconstruction, discussing both fundamental approaches and recent State-of-the-Art. The focus will be on various approaches to depth estimation, from the use of graphs to implicitly reason about depth, to more explicit representations. Additionally, a core component of the tutorial will be centred on a novel Monocular Depth Estimation (MDE) benchmarking procedure. This will cover important topics such as training different baselines in a fair and comparable manner, the selection of metrics and a new evaluation dataset containing a variety of complex urban and natural scenes.

Slides

Introduction to MDE – Jaime Spencer
Benchmarking MDE: The Design Decisions that Matter – Jaime Spencer
Introduction to BEV Mapping – Avishkar Saha
Addressing the shortcomings of BEV Mapping – Avishkar Saha

Organizers