Humans rely on stereo vision and motion parallax to estimate depth in their near surroundings. However, these cues become weaker as depth increases. As a result, humans rely profoundly on monocular cues when estimating depth in the far range.
Depth estimation is one of the central challenges of monocular 3D reconstruction. Computer vision algorithms for 3D reconstruction from monocular images have advanced substantially over the past few years. However, depth estimation in the far range still suffers from poor accuracy. This can be partly attributed to the insufficient cues used by current approaches. Moreover, the benchmarking procedure for these algorithms has remained largely unchanged relying on simple metrics and sparse LiDAR data. This prevents insights into the performance of each method, especially where the ground-truth is incorrect.
This tutorial will serve as an introduction to the field of monocular 3D reconstruction, discussing both fundamental approaches and recent State-of-the-Art. The focus will be on various approaches to depth estimation, from the use of graphs to implicitly reason about depth, to more explicit representations. Additionally, a core component of the tutorial will be centred on a novel Monocular Depth Estimation (MDE) benchmarking procedure. This will cover important topics such as training different baselines in a fair and comparable manner, the selection of metrics and a new evaluation dataset containing a variety of complex urban and natural scenes.
Slides
- Introduction to MDE – Jaime Spencer
- Benchmarking MDE: The Design Decisions that Matter – Jaime Spencer
- Introduction to BEV Mapping – Avishkar Saha
- Addressing the shortcomings of BEV Mapping – Avishkar Saha
Organizers

Jaime Spencer
Research Fellow
University of Surrey

Avishkar Saha
PhD Student
University of Surrey

Chris Russell
Senior Applied Scientist
Amazon

Simon Hadfield
Senior Lecturer
University of Surrey

Richard Bowden
Professor
University of Surrey