Welcome to the 4th Monocular Depth Estimation Challenge Workshop organized at






Monocular depth estimation (MDE) is an important low-level vision task with applications in fields such as augmented reality, robotics, and autonomous vehicles. In 2024, the field was dominated by generative approaches, with DepthAnything representing the transformer-based solution and Marigold being a denoising diffusion model based on the popular Text-to-Image LDM Stable Diffusion. Even before that, there has been an increased interest in self-supervised systems capable of predicting the 3D scene structure without requiring ground-truth LiDAR training data. The automotive industry accelerated the development of these systems thanks to the vast quantities of data and the ubiquity of stereo camera rigs. However, the evaluation process has remained focused on in-domain evaluation, relying on simple metrics and sparse LiDAR data.
This workshop seeks to answer the following questions:
- How well do networks generalize beyond their training distribution relative to humans?
- What metrics provide the most insight into the model’s performance?
- How do the predictions made by the models differ from how humans perceive depth?
The workshop will consist of two parts: invited keynote talks discussing current developments in MDE and a challenge organized around a benchmarking procedure using the SYNS dataset.
News
- 01 Feb 2025 — 🏆 Submissions to the challenge are now open!
-
30 Jan 2025 —
Konrad Schindler confirmed as keynote speaker.
-
07 Jan 2025 —
Yiyi Liao confirmed as keynote speaker.
-
06 Jan 2025 —
Peter Wonka confirmed as keynote speaker.
-
05 Jan 2025 —
Website is live!
Keynote Speakers

Peter Wonka
Full Professor
KAUST

Yiyi Liao
Assistant Professor
Zhejiang University

Konrad Schindler
Full Professor
ETH Zurich
Peter Wonka is a full professor of computer science at King Abdullah University of Science and Technology (KAUST). Peter Wonka received his doctorate in computer science from the Technical University of Vienna. Additionally, he received a Master of Science in Urban Planning from the same institution. After his PhD, Dr. Wonka worked as a postdoctoral researcher at the Georgia Institute of Technology and as faculty at Arizona State University. His research publications tackle various computer vision, computer graphics, and machine learning topics. The current research focuses on deep learning, generative models, and 3D shape analysis and reconstruction.
Yiyi Liao is an assistant professor at Zhejiang University. Prior to that, she received her Ph.D. degree from Zhejiang University and subsequently worked as a Postdoc at MPI for Intelligent Systems. Her research interest lies in 3D computer vision and immersive media, including reconstruction, generation, and compression. She received the Best Robot Vision Paper award at ICRA 2024. She serves as a program chair for 3DV 2025 and an area chair for CVPR and NeurIPS.
Konrad Schindler received the Diplomingenieur (M.Tech.) degree in photogrammetry from the Vienna University of Technology, Vienna, Austria, in 1999 and the Ph.D. degree from the Graz University of Technology, Graz, Austria, in 2003. He was a Photogrammetric Engineer in the private industry and held researcher positions at the Computer Graphics and Vision Department, Graz University of Technology, the Digital Perception Laboratory, Monash University, Melbourne, VIC, Australia, and the Computer Vision Laboratory, ETH Zürich, Zürich, Switzerland. He was an Assistant Professor of Image Understanding with TU Darmstadt, Darmstadt, Germany, in 2009. Since 2010, he has been a Tenured Professor of Photogrammetry and Remote Sensing with ETH Zürich. His research interests include computer vision, photogrammetry, and remote sensing, with a focus on image understanding and information extraction reconstruction. Dr. Schindler has been serving as an Associate Editor of the Journal of Photogrammetry and Remote Sensing of the International Society for Photogrammetry and Remote Sensing (ISPRS) since 2011, and previously served as an Associate Editor of the Image and Vision Computing Journal from 2011 to 2016. He was the TC President of the ISPRS from 2012 to 2016.
Challenge
The challenge focuses on evaluating novel MDE techniques on the SYNS-Patches dataset. This dataset provides a challenging variety of urban and natural scenes, including forests, agricultural settings, residential streets, industrial estates, lecture theatres, offices, and more. Furthermore, the high-quality, dense ground-truth LiDAR allows for the computation of more informative evaluation metrics, such as those focused on depth discontinuities.
[GitHub Starter Pack] — [CodaLab Challenge]






⚡ What’s new in MDEC 2025?
- 📐 New prediction types: The challenge became more accessible thanks to the added support of
affine-invariant
predictions.metric
andscale-invariant
predictions are also automatically supported.disparity
predictions, which were supported in previous challenges, are also accepted. - 🤗 Pre-trained Model Support: We provide ready-to-use scripts for off-the-shelf methods: Depth Anything V2 (
disparity
) and Marigold (affine-invariant
). These will serve as a competitive baseline for the challenge and a starting point for participants. - 📊 Updated Evaluation Pipeline: The CodaLab grader code has been updated to accommodate the newly supported prediction types.
🚀 How to participate?
- Check out the new starter pack GitHub. The mdec_2025 folder contains scripts generating valid submissions for Marigold (
affine-invariant
) and Depth Anything v2 (disparity
). - Identify the prediction type of your method and generate a valid submission:
val
split for the “Development” phase andtest
split for the “Final” phase. - Register at the CodaLab Challenge site, check the submission constraints and extra conditions, and submit to the leaderboard.
The phases are open according to the following schedule:
- “Development”: Feb 01 - Mar 01
- “Final”: Mar 01 - Mar 21
📊 Evaluation
Submissions will be evaluated on a variety of metrics:
- Pointcloud reconstruction: F-Score
- Image-based depth: MAE, RMSE, AbsRel
- Depth discontinuities: F-Score, Accuracy, Completeness
The leading metric is F-Score (based on the point cloud), denoted as F (↑) in the leaderboard. Challenge winners will be determined based on the performance ranked by the leading metric on the withheld validation (“Development” phase) and the test (“Final” phase) sets of the SYNS-Patches dataset.
To measure the performance locally with other datasets or troubleshoot scoring issues within the challenge, refer to the evaluation code.
📈 Baselines
This year, we switched to LSE-based alignment between predictions and ground truth maps to accept various types of predictions.
In addition to previously accepted disparity
prediction methods, we welcome affine-invariant
, scale-invariant
, and metric
types.
Accordingly, we updated the benchmark with more recent baselines, such as Marigold (affine-invariant
), Depth Anything v2 (disparity
), and the winners of the 3rd edition of the MDEC challenge, whose performances are reported below.
F (↑) | F (↑) (Edges) |
MAE (↓) | RMSE (↓) | AbsRel (↓) | Acc (↑) (Edges) |
Comp (↓) (Edges) |
δ<1.25 (↑) | δ<1.25^2 (↑) | δ<1.25^3 (↑) | |
---|---|---|---|---|---|---|---|---|---|---|
PICO-MR | 21.07 | 8.77 | 3.22 | 5.60 | 20.33 | 3.69 | 15.41 | 0.7559 | 0.9125 | 0.9590 |
EVP++ | 19.66 | 9.02 | 3.20 | 5.49 | 19.03 | 2.66 | 9.28 | 0.7553 | 0.9182 | 0.9661 |
Marigold | 18.64 | 9.26 | 3.87 | 6.49 | 24.37 | 2.90 | 20.09 | 0.6903 | 0.8860 | 0.9453 |
Depth Anything v2 | 14.34 | 7.94 | 4.16 | 7.94 | 25.48 | 2.64 | 30.05 | 0.6907 | 0.8849 | 0.9469 |
Garg’s Baseline | 11.38 | 6.03 | 4.62 | 7.58 | 31.15 | 4.01 | 41.24 | 0.5842 | 0.8354 | 0.9251 |
📚 Workshop proceedings
As part of the CVPR Workshop Proceedings, we will publish a paper summarizing the results of the challenge. The following conditions must be met to have the method included in the paper:
- The method surpasses the performance of the baselines in the leading metric (F-Score);
- The method should not be trivial;
- Each prediction is made using a single corresponding input image;
Once the challenge has finished, we will reach out to the participants meeting the criteria above to request information about their affiliation, a short description of their method, and the method’s source code. Participants not providing this information will not be added to the publication; their submission will stay anonymous in the leaderboard.
Selected top performers will also be invited to present their methods at the workshop. The presentation can be held either in person or virtually. This is mandatory; refusal to do so will result in an invalidated submission and removal from the paper.
🤵 Organizers

Anton Obukhov
Principal Research Scientist
Huawei Research Center Zürich

Ripudaman Singh Arora
Principal ML Researcher
Blue River Technology

Jaime Spencer
Data Engineer
Oxa

Fabio Tosi
Junior Assistant Professor
University of Bologna

Matteo Poggi
Tenure-Track Assistant Professor
University of Bologna

Chris Russell
Associate Professor
Oxford Internet Institute

Simon Hadfield
Associate Professor
University of Surrey

Richard Bowden
Professor
University of Surrey