Welcome to the 1st Monocular Depth Estimation Challenge Workshop organized at
Monocular depth estimation (MDE) is an important low-level vision task, with application in fields such as augmented reality, robotics and autonomous vehicles. Recently, there has been an increased interest in self-supervised systems capable of predicting the 3D scene structure without requiring ground-truth LiDAR training data. Automotive data has accelerated the development of these systems, thanks to the vast quantities of data, the ubiquity of stereo camera rigs and the mostly-static world. However, the evaluation process has also remained focused on only the automotive domain and has been largely unchanged since its inception, relying on simple metrics and sparse LiDAR data.
This workshop seeks to answer the following questions:
- How well do networks generalize beyond their training distribution relative to humans?
- What metrics provide the most insight into the model’s performance? What is the relative weight of simple cues, e.g. height in the image, in networks and humans?
- How do the predictions made by the models differ from how humans perceive depth? Are the failure modes the same?
The workshop will therefore consist of two parts: invited keynote talks discussing current developments in MDE and a challenge organized around a novel benchmarking procedure using the SYNS dataset.
Paper
Videos
News
- 07 Jan 2023 — The workshop has now concluded! Talks will be made available soon.
- 19 Dec 2022 — Workshop schedule released.
- 23 Nov 2022 — The paper summarizing the challenge is now available on ArXiv.
- 14 Nov 2022 — The challenge has now finished! Thank you to all participants.
- 04 Nov 2022 — Challenge submission deadline has been extended until 14-Nov-2022.
- 26 Oct 2022 — Final phase of the challenge is live.
- 30 Sep 2022 — Challenge website is live! Running from 05-Oct-2022 to 08-Nov-2022.
- 16 Sep 2022 — Oisin Mac Aodha confirmed as keynote speaker.
- 17 Aug 2022 — James Elder confirmed as keynote speaker.
- 17 Aug 2022 — Website is live!
Important Dates
- 05 Oct 2022 (00:00 UTC) — Challenge Development Phase Opens (Val)
- 26 Oct 2022 (00:00 UTC) — Challenge Final Phase Opens (Test)
- 08 Nov 2022 (23:59 UTC) — Challenge Submission Closes
- 14 Nov 2022 (23:59 UTC) — Challenge Submission Closes (UPDATED)
- 11 Nov 2022 — Method Description Submission
- 15 Nov 2022 — Invited Talk Notification
- 07 Jan 2023 (Half-day AM) — MDEC Workshop @ WACV 2023
Schedule
NOTE: Times are shown in Hawaii Standard Time. Please take this into account if joining the workshop virtually.
*Virtual talk only. All other talks will be hybrid.
Time (HST) | Duration | Event |
---|---|---|
08:20 - 08:30 | 10 mins | Introduction |
08:30 - 09:15 | 45 mins | Oisin Mac Aodha – Advancing Monocular Depth Estimation* |
09:15 - 10:00 | 45 mins | James Elder – Monocular 3D Perception in Humans and Machines |
10:00 - 10:30 | 30 mins | Break |
10:30 - 11:00 | 30 mins | The Monocular Depth Estimation Challenge |
11:00 - 11:20 | 20 mins | Challenge Participant: Team z.suri* |
11:20 - 11:40 | 20 mins | Challenge Participant: Team MonoViT* |
11:40 - 11:50 | 10 mins | Closing Notes |
Keynote Speakers
James Elder is Professor and York Research Chair in Human and Computer Vision, Department of Electrical Engineering & Computer Science (Lassonde School of Engineering), Department of Psychology (Faculty of Health) and Co-Director of the Centre for AI & Society at York University, Toronto, Canada. Dr. Elder’s research seeks to improve machine vision systems through a better understanding of visual processing in biological systems. He currently leads the ORF-RE project Intelligent Systems for Sustainable Urban Mobility. He also holds a number of patents on attentive vision technologies and is the co-founder of the AI start-up AttentiveVision. He is appointed to the Editorial Boards of three international journals.
Oisin Mac Aodha is a Lecturer in Machine Learning in the School of Informatics at the University of Edinburgh. From 2016-2019, he was a postdoc in Prof. Pietro Perona’s Computational Vision Lab at Caltech. Prior to that, he was a postdoc in the Department of Computer Science at University College of London (UCL) with Prof. Gabriel Brostow and Prof. Kate Jones. He received his PhD from UCL in 2014, advised by Prof. Gabriel Brostow, and has an MSc in Machine Learning from UCL an BEng in electronic and computing engineering from the University of Galway. Along with being a Fellow of the Alan Turing Institute and a European Laboratory for Learning and Intelligent Systems (ELLIS) Scholar. His current research interests are in the areas of computer vision and machine learning, with a specific emphasis on shape and depth estimation, human-in-the-loop learning, and fine-grained image understanding.
Challenge Winners
Congratulations to the OPDAI team on achieving the top performing submission (F-Score)!
F-Score | F-Score (Edges) |
MEA | RMSE | AbsRel | Acc (Edges) |
Comp (Edges) |
|
---|---|---|---|---|---|---|---|
Baseline | 13.72 | 7.76 | 5.56 | 9.72 | 32.04 | 3.97 | 21.63 |
OPDAI | 13.53 | 7.41 | 5.20 | 8.98 | 29.66 | 3.67 | 27.31 |
z.suri | 13.08 | 7.46 | 5.39 | 9.27 | 29.96 | 3.81 | 32.70 |
Anonymous | 12.85 | 7.30 | 5.32 | 9.04 | 30.22 | 3.83 | 43.77 |
MonoViT | 12.66 | 7.51 | 5.22 | 8.96 | 29.70 | 3.36 | 35.47 |
Teams
- OPDAI: Hao Wang, Yusheng Zhang, Heng Cong
- z.suri: Zeeshan Khan Suri
- Anonymous
- MonoViT: Chaoqiang Zhao, Mateo Poggi, Fabio Tosi, Youming Zhang, Yang Tang, Stefano Mattoccia
Challenge
Teams submitting to the challenge will also be required to submit a description of their method. As part of the WACV Proceedings, we will publish a paper summarizing the results of the challenge, including a description of each method. All challenge participants surpassing the performance of the Garg baseline [13.7211] (by jspenmar) will be added as authors in this paper. Top performers will additionally be invited to present their method at the workshop. This presentation can be either in-person or virtually.
[GitHub] — [Challenge] — [Paper]
The challenge focuses on evaluating novel MDE techniques on the SYNS-Patches dataset proposed in this benchmark. This dataset provides a challenging variety of urban and natural scenes, including forests, agricultural settings, residential streets, industrial estates, lecture theatres, offices and more. Furthermore, the high-quality dense ground-truth LiDAR allows for the computation of more informative evaluation metrics, such as those focused on depth discontinuities.
The challenge is hosted on CodaLab. We have provided a GitHub repository containing training and evaluation code for multiple recent SotA approaches to MDE. These will serve as a competitive baseline for the challenge and as a starting point for participants. The challenge leaderboards use the withheld validation and test sets for SYNS-Patches. We additionally encourage evaluation on the public Kitti Eigen-Benchmark dataset.
Submissions will be evaluated on a variety of metrics:
- Pointcloud reconstruction: F-Score
- Image-based depth: MAE, RMSE, AbsRel
- Depth discontinuities: F-Score, Accuracy, Completeness
Challenge winners will be determined based on the pointcloud-based F-Score performance.