Learning to Synthesize Images

Seminar held and organized by

Prof. Dr. Carsten Rother

Philip-W. Grassal

Titus Leistner

Visual Learning Lab, Heidelberg University

 

Target audience: (Master) Students from physics, mathematics and computer science departments
Prerequisites: Prior knowledge about machine learning, computer vision/graphics is beneficial
Point of contact: For issues related to the seminar, please contact Philip-W. Grassal (Contact details: https://hci.iwr.uni-heidelberg.de/vislearn/people/). All upcoming information regarding the seminar will be send to the participants registered via Muesli.
Registration: Via Muesli (https://muesli.mathi.uni-heidelberg.de/lecture/view/1341)
Maximum number of participant: 23
Seminar material: Available on HeiBox (You will get the link and password after registration)
Time: Tuesday, 2 - 3:30 pm
Place : The seminar will be held via heiCONF (You will get the link and password after registration)
Grading: All students are required to hand in a 30 min presentation and a written report about a computer vision paper. See details below.

Description

Images and videos are omnipresent. Nowadays, these are not always taken by a camera but also synthesized by learning-based algorithms. Such new imagery can be used for many applications which are good for society: better video conferencing, gaming, augmented and virtual reality, training data for compute vision applications, and fun-applications. It can, however, also be used malicious activity, such as faked news.

In this seminar we look at a broad range of areas related to synthesizing images, and in particular we look at the latest trends. This includes image generation, differentiable rendering, image-based rendering, novel view and human avatar synthesis. On the methodological side, we will discuss various deep learning architectures that tremendously influenced the progress recently.

Seminar structure: In the first meeting of the seminar we give a brief introduction into the broad field. In this meeting each person gets assigned one paper. In each seminar slot two people present in front of the class. In addition, each students has to write a short report about the paper. In this report the student follows the standard review process of an article. Besides summarizing the article, the student has to write about the positive and negative aspects of the paper. See details below.

Literature

Image Generation (2 Talks)

·         Karras, T., Laine, S., & Aila, T. (2019). A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 4401-4410).

·         Zhang, H., Goodfellow, I., Metaxas, D., & Odena, A. (2019, May). Self-attention generative adversarial networks. In International conference on machine learning (pp. 7354-7363). PMLR.

Differentiable Rasterization and Ray Tracing (5 Talks)

·         Loper, M. M., & Black, M. J. (2014, September). OpenDR: An approximate differentiable renderer. In European Conference on Computer Vision (pp. 154-169).

·         Chen, Wenzheng, et al. "Learning to predict 3d objects with an interpolation-based differentiable renderer." In: NeurIPS (2019).

·         Liu, S., Li, T., Chen, W., & Li, H. (2019). Soft rasterizer: A differentiable renderer for image-based 3d reasoning. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 7708-7717).

·         Kato, H., Ushiku, Y., & Harada, T. (2018). Neural 3d mesh renderer. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3907-3916).

·         Li, T. M., Aittala, M., Durand, F., & Lehtinen, J. (2018). Differentiable monte carlo ray tracing through edge sampling. ACM Transactions on Graphics (TOG), 37(6), 1-11.

Image-based Rendering (11 Talks)

·         Riegler, G., & Koltun, V. (2020, August). Free view synthesis. In European Conference on Computer Vision (pp. 623-640). Springer, Cham.

·         Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T., Ramamoorthi, R., & Ng, R. (2020, August). Nerf: Representing scenes as neural radiance fields for view synthesis. In European Conference on Computer Vision (pp. 405-421). Springer, Cham.
and follow-up
Zhang, K., Riegler, G., Snavely, N., & Koltun, V. (2020). Nerf++: Analyzing and improving neural radiance fields. arXiv preprint arXiv:2010.07492.

·         Levoy, M., & Hanrahan, P. (1996, August). Light field rendering. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques (pp. 31-42).

·         Debevec, P. E., Taylor, C. J., & Malik, J. (1996, August). Modeling and rendering architecture from photographs: A hybrid geometry-and image-based approach. In Proceedings of the 23rd annual conference on Computer graphics and interactive techniques (pp. 11-20).

·         Mildenhall, B., Srinivasan, P. P., Ortiz-Cayon, R., Kalantari, N. K., Ramamoorthi, R., Ng, R., & Kar, A. (2019). Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Transactions on Graphics (TOG), 38(4), 1-14.

·         Bemana, M., Myszkowski, K., Seidel, H. P., & Ritschel, T. (2020). X-Fields: implicit neural view-, light-and time-image interpolation. ACM Transactions on Graphics (TOG), 39(6), 1-15.

·         Broxton, M., Flynn, J., Overbeck, R., Erickson, D., Hedman, P., Duvall, M., ... & Debevec, P. (2020). Immersive light field video with a layered mesh representation. ACM Transactions on Graphics (TOG), 39(4), 86-1.

·         Thies, J., Zollhöfer, M., & Nießner, M. (2019). Deferred neural rendering: Image synthesis using neural textures. ACM Transactions on Graphics (TOG), 38(4), 1-12.

·         Chen, A., Wu, M., Zhang, Y., Li, N., Lu, J., Gao, S., & Yu, J. (2018). Deep surface light fields. Proceedings of the ACM on Computer Graphics and Interactive Techniques, 1(1), 1-17.

·         Wang, Q., Wang, Z., Genova, K., Srinivasan, P., Zhou, H., Barron, J. T., ... & Funkhouser, T. (2021). IBRNet: Learning Multi-View Image-Based Rendering. arXiv preprint arXiv:2102.13090.

·         Zhou, T., Tucker, R., Flynn, J., Fyffe, G., & Snavely, N. (2018). Stereo magnification: Learning view synthesis using multiplane images. arXiv preprint arXiv:1805.09817.

Human Image Synthesis (5 Talks)

·         Genova, K., Cole, F., Maschinot, A., Sarna, A., Vlasic, D., & Freeman, W. T. (2018). Unsupervised training for 3d morphable model regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 8377-8386).

·         Lombardi, S., Simon, T., Schwartz, G., Zollhoefer, M., Sheikh, Y., & Saragih, J. (2021). Mixture of Volumetric Primitives for Efficient Neural Rendering. arXiv preprint arXiv:2103.01954.

·         Wang, T. C., Mallya, A., & Liu, M. Y. (2020). One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing. arXiv preprint arXiv:2011.15126.

·         Zakharov, E., Ivakhnenko, A., Shysheya, A., & Lempitsky, V. (2020, August). Fast Bi-layer Neural Synthesis of One-Shot Realistic Head Avatars. In European Conference on Computer Vision (pp. 524-540). Springer, Cham.

·         Lombardi, S., Saragih, J., Simon, T., & Sheikh, Y. (2018). Deep appearance models for face rendering. ACM Transactions on Graphics (TOG), 37(4), 1-13.

Schedule

We start with a kick-off meeting where each student has to choose a paper from the list above which he or she would like to present. The kick-off meeting is followed by three Q&A sessions which the teaching assistants will use to discuss fundamentals being relevant for the seminar. If the students are not familiar with the fundamentals, they should prepare by reading the suggested literature for the respective session. After those introductory sessions, each student is supposed to present their chosen paper and hand in their written report (please look at the grading details below).

The schedule of talks is subject to change depending on the number of participants.

13.04. - Kick-Off Meeting (Introduction, Seminar overview, Administration, Topic assignment)

20.04. - Fundamentals Computer Graphics

·         If you are not familiar with the fundamentals of computer graphics (especially geometry representations, rendering pipeline, transformations, texture mapping), we suggest reading chapters 1, 2, 4, 6 in

Real-Time Rendering, 4th Edition by Eric Haines; Tomas Akenine-Moeller; Naty Hoffman

·         The book is available for online reading for students registered at Heidelberg University. Students have access to it via Heidi (University library).

·         On the 20.4., we will briefly summarize those chapters and answer questions.

27.04. - Fundamentals Computer Vision / 3D Reconstruction

·         If you are not familiar with the fundamentals of computer vision (in particular image formation, recognition, reconstruction, depth estimation), we suggest reading chapters 1, 2, (6), 11, 12 in

Computer Vision: Algorithms and Applications, 2nd Edition by Richard Szeliski

·         The book is available at http://szeliski.org/Book

·         On the 27.4., we will briefly summarize those chapters and answer questions.

04.05. - Fundamentals Deep Learning / Optimization

·         If you are not familiar with deep learning (in particular gradient-based optimization, CNNs, GANs), we suggest reading chapters 5, 6, 8, 9 in

Deep Learning, by Ian Goodfellow and Yoshua Bengio and Aaron Courville

·         The book is available at https://www.deeplearningbook.org/

·         Alternatively, you may read chapter 5 in

Computer Vision: Algorithms and Applications, 2nd Edition by Richard Szeliski

from the previous week.

·         On the 4.5., we will briefly summarize those chapters and answer questions.

11.05. - 2 Talks (Image Generation)

18.05. - 2 Talks (Differentiable Rendering)

25.05. - 2 Talks (Differentiable Rendering)

01.06. - 2 Talks (Differentiable Rendering + Image-Based Rendering)

08.06. - 2 Talks (Image-Based Rendering)

15.06. - 2 Talks (Image-Based Rendering)

22.06. - 2 Talks (Image-Based Rendering)

29.06. - 2 Talks (Image-Based Rendering)

06.07. - 2 Talks (Human Image Synthesis)

13.07. - 2 Talks (Human Image Synthesis)

20.07. - 1 Talk (Human Image Synthesis) + Wrap-up

01.09. - Deadline for handing in your reports!

Grading

The grade for the seminar consists of a graded report and a presentation. Yet, we will award extra credit for active participation and valuable contributions to seminar discussions. We require everyone to attend the seminar. You are allowed to miss it twice.

Presentation

Each student is required to give a 30 min talk about a paper from the list above. We will confirm the choice of the paper and assign a presentation slot. After every talk, the presenting student has to answer questions and discuss their topic for about 15 min with the audience. During the presentation, the student shall

·         explain prior work and the problem domain thoroughly. The audience should be able to understand what the problem is, how it has been solved so far and what distinguishes the presented method from its related work.

·         inform about the details of the method being presented in the paper. The students shall explain the method in their own words and show that they fully understand the topic .

·         detail what experiments have been conducted in order to prove the success of the method. What is the purpose of each experiment what do the results indicate? Are the findings reasonable?

·         critically analyze the method, experiments and findings.

The week prior to their presentation, the student is supposed to hand in their preliminary slides as PDF file and may contact their assigned supervisor to request feedback.

Report

In addition to the presentation, every student is required to write a report of 5 pages that reviews the chosen paper. The report should follow conference reviewing guidelines with the exception that your summary should be longer (4 pages) and elaborate the contributions/experiments in greater detail. Your report structure shall look as follows:

·         Summary (4 pages): Explain the key ideas, contributions, and their significance. This is your version of the paper. The summary helps us to understand the rest of your review and be confident that you understand the paper.

·         Strengths (1/4 page): What about the paper provides value -- interesting ideas that are experimentally validated, an insightful organization of related work, new tools, impressive results, something else? Most importantly, what can someone interested in the topic learn from the paper?

·         Weaknesses (1/4 page): What detracts from the contributions? Does the paper lack controlled experiments to validate the contributions? Are there misleading claims or technical errors? Is it possible to understand (and ideally reproduce) the method and experimental setups by reading the paper?

·         Rating and Justification (1/4 page): Carefully explain why the paper should be accepted or not. This section should make clear which of the strengths and weaknesses you consider most significant.

·         Additional comments (1/4 page): minor suggestions, questions, corrections, etc. that can help the authors improve the paper, but are not crucial for the overall recommendation.

The last page shows if the student does not only understand the method of the paper but is also able to understand its value with respect to prior work in the problem domain.

The layout of the report must follow the CVPR paper template (see http://cvpr2020.thecvf.com/sites/default/files/2019-09/cvpr2020AuthorKit...). Also, the students must include their names, student numbers, email address and subject of study in their reports. Otherwise, this will affect their final grade.