Computer Vision

Dr. Bogdan Savchynskyy, SoSe 2025

This seminar belongs to the Master in Physics (specialization Computational Physics, code "MVSem") and Master of Data and Computer Science, Applied Informatics (code "IS") , but is also open for students of Scientific Computing and anyone interested.

Summary

The topic of this semester is

Video-Based Scene Analysis.

We will consider inference and learning techniques for these problems as well as the related applications in computer vision.

General Information

Please register for the seminar in Müsli. The first seminar will take place on Thursday, April 17 at 11:00. Please make sure to participate!

Seminar: Thu, 11:00 – 13:00 in Mathematikon B (Berliner Str. 43), SR B128
Entrance through the door at the side of Berlinerstrasse. Ring the door bell labelled "HCI am IWR" to be let in. The seminar room is on the 3rd floor.
Credits: 4/ 6 CP depending on course of study.

Seminar Repository:

Slides and schedule of the seminar will be placed in HeiBox .

Papers to Choose from:

[1] VideoDepthAnything: : Consistent Depth Estimation for Super-Long Videos Sili Chen; Hengkai Guo; Shengnan Zhu; Feihu Zhang; Zilong Huang; Jiashi Feng, Bingyi Kang https://arxiv.org/pdf/2501.12375

[2] DepthCrafter: enerating Consistent Long Depth Sequences for Open-world Videos Wenbo Hu1∞ Xiangjun Gao2; Xiaoyu Li1; Sijie Zhao; Xiaodong Cun;Yong Zhang; Long Quan;Ying Shan https://arxiv.org/pdf/2409.02095

[3] ChronoDepth: Learning Temporally Consistent Video Depth from Video Diffusion Priors Jiahao Shao1; Yuanbo Yang; Hongyu Zhou; Youmin Zhang; Yujun Shen; Vitor Guizilini; Yue Wang; Matteo Poggi; Yiyi Liao1 https://arxiv.org/pdf/2406.01493

[4] Video Depth without Video Models Bingxin Ke; Dominik Narnhofer; Shengyu Huang; Lei Ke; Torben Peters; Katerina Fragkiadaki; Anton Obukhov, Konrad Schindler https://arxiv.org/pdf/2411.19189

[5] UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler Luigi Piccinelli, Christos Sakaridis, Yung-Hsu Yang, Mattia Segu, Siyuan Li, Wim Abbeloos, L. V. Gool. https://arxiv.org/pdf/2502.20110

[6] Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video David Yifan Yao, Albert J. Zhai, Shenlong Wang https://arxiv.org/pdf/2503.21761

[7] Distill Any Depth: Distillation Creates a Stronger Monocular Depth Estimator Xiankang He; Dongyan Guo; Hongji Li; Ruibo Li; Ying Cui; Chi Zhang: https://arxiv.org/pdf/2502.19204

[8] STATIC: urface Temporal Affine for TIme Consistency in Video Monocular Depth Estimation Sunghun Yang; Minhyeok Lee; Suhwan Cho; Jungho Lee; Sangyoun Lee: https://arxiv.org/pdf/2412.01090

[9] MC2: Multi-view Consistent Depth Estimation via Coordinated Image-based Neural Rendering Subin Kim; Seong Hyeon Park; Sihyun Yu; Kihyuk Sohn; Jinwoo Shin: https://neural-rendering.com/papers/31.pdf

[10] Geometry Crafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors Tian-Xing Xu; Xiangjun Gao; Wenbo Hu; Xiaoyu Li; Song-Hai Zhang; Ying Shan: https://arxiv.org/pdf/2504.01016

[11] Depthor: Depth Enhancement from a Practical Light-Weight dToF Sensor and RGB Image Jijun Xiang; Xuan Zhu; Xianqi Wang; Yu Wang; Hong Zhang; Fei Guo; Xin Yang: https://arxiv.org/pdf/2504.01596

[12] Endo3R: Unified Online Reconstruction from Dynamic Monocular Endoscopic Video Jiaxin Guo; Wenzhen Dong; Tianyu Huang, Hao Ding; Ziyi Wang; Haomin Kuang; Qi Dou; and Yun-hui Liu https://arxiv.org/pdf/2504.03198

[13] Vision Transformer: AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn; Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby: https://arxiv.org/pdf/2010.11929

[14] ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders Sanghyun Woo; Shoubhik Debnath; Ronghang Hu; Xinlei Chen; Zhuang Liu; In So Kweon; Saining Xie: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10205236

[15] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows Ze Liu; Yutong Lin; Yue Cao; Han Hu; Yixuan Wei; Zheng Zhang; Stephen Lin; Baining Guo https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9710580

[16] Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Computer Vision Tasks Micah Goldblum; Hossein Souri; Renkun Ni; Manli Shu; Viraj Prabhu; Gowthami Somepalli; Prithvijit Chattopadhyay; Mark Ibrahim; Adrien Bardes; Judy Hoffman; Rama Chellappa; Andrew Gordon Wilson; Tom Goldstein; https://arxiv.org/pdf/2310.19909

[17] Seeing World Dynamics in a Nutshell Qiuhong Shen1; Xuanyu Yi; Mingbao Lin; Hanwang Zhang; Shuicheng Yan; Xinchao Wang: https://arxiv.org/pdf/2502.03465

[18] Depth Pro: SHARP MONOCULAR METRIC DEPTH IN LESS THAN A SECOND Aleksei Bochkovskii; Ama¨ el Delaunoy; Hugo Germain; Marcel Santos; Yichao Zhou; Stephan R. Richter; Vladlen Koltun https://arxiv.org/pdf/2410.02073

[19] ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth Shariq Farooq Bhat; Reiner Birkl, Diana Wofk Peter Wonka; Matthias Müller https://arxiv.org/pdf/2302.12288

[20] UniK3R: Universal Camera Monocular 3D Estimation Luigi Piccinelli; Christos Sakaridis; Mattia Segu; Yung-Hsu Yang; Siyuan Li; Wim Abbeloos; Luc Van Gool https://arxiv.org/pdf/2503.16591

Contact

Dr. Bogdan Savchynskyy
In case you contact me via email, its subject should contain the tag [SemCV]. Emails without this tag have a very high chance to be lost and get ignored therefore!

top