Efficient dense stereo with occlusions for new view-synthesis by four-state dynamic programming

Title	Efficient dense stereo with occlusions for new view-synthesis by four-state dynamic programming
Publication Type	Journal Article
Year of Publication	2007
Authors	Criminisi, A, Blake, A, Rother, C, Shotton, J, Torr, PHS
Journal	International Journal of Computer Vision
Volume	71
Pagination	89–110
ISSN	09205691
Keywords	Dense stereo, Gaze correction, Image-based rendering, Video-conferencing
Abstract	A new algorithm is proposed for efficient stereo and novel view synthesis. Given the video streams acquired by two synchronized cameras the proposed algorithm synthesises images from a virtual camera in arbitrary position near the physical cameras. The new technique is based on an improved, dynamic-programming, stereo algorithm for efficient novel view generation. The two main contributions of this paper are: (i) a new four state matching graph for dense stereo dynamic programming, that supports accurate occlusion labelling; (ii) a compact geometric derivation for novel view synthesis by direct projection of the minimum cost surface. Furthermore, the paper presents an algorithm for the temporal maintenance of a background model to enhance the rendering of occlusions and reduce temporal artefacts (flicker); and a cost aggregation algorithm that acts directly in the three-dimensional matching cost space. The proposed algorithm has been designed to work with input images with large disparity range, a common practical situation. The enhanced occlusion handling capabilities of the new dynamic programming algorithm are evaluated against those of the most powerful state-of-the-art dynamic programming and graph-cut techniques. Four-state DP is also evaluated against the disparity-based Middlebury error metrics and its performance found to be amongst the best of the efficient algorithms. A number of examples demonstrate the robustness of four-state DP to artefacts in stereo video streams. This includes demonstrations of cyclopean view synthesis in extended conversational sequences, synthesis from a freely translating virtual camera and, finally, basic 3D scene editing. © 2006 Springer Science + Business Media, LLC.
DOI	10.1007/s11263-006-8525-1
Citation Key	Criminisi2007

top