Challenging Data for Stereo and Optical Flow

Abstract

Selected scenes for stereo disparity and optical flow estimation containing yet unsolved challenges.

Introduction

Currently only few test sequences for optical flow and stereo are available. Most of these show highly controlled indoor scenes and do not contain the complexity that is commonly encountered in outdoor environments. Our aim is to provide new, challenging outdoor data to stimulate research in computer vision.
We acquired several million frames with a carefully devised stereo camera system. The recorded scenes provide a huge variety of different weather conditions, different motion and depth layers; they contain city and countryside situations and were acquired at night and at day. From this large quantity of data we selected eleven scenes, each containing a different challenge, highlighting problems that occur regularly.
We estimated optical flow and stereo on 10.000 manually selected frames and found that state-of-the-art algorithms frequently fail to estimate reliable correspondences in situations that are summarized in the selected scenes. We observed that these situations fundamentally violate common model assumptions such as brightness constancy and single motion per pixel. With access to this highly challenging data, we also like to encourage alternative approaches that open up new ways to deal with the occuring problems.
On this webpage, the challenges in these frames are described and links for the download of the sequences are provided. To keep the data managable, each scene contains about 30 frames and keyframes are named for which the described phenomenom is most explicit. Furthermore we provide a minute description of the recoding and calibration procedures and additionally we supply code to simplify the dealing with the data and the visualization of the results.

This dataset is part of the robust vision challenge: click here for more information!

Scene Description and Visualization of Correspondences on Reference Frame

Each scene is named with keywords that give a short hint on its principal content. In this section, a more detailed description of the challenge in the scenes is provided together with a reference frame where the problem is most evident.
Additionally, we provide exemplary results on the reference frame to visualize some of the problems. The reference stereo results have been computed by a self-implemented variant of the SGM method using the rank filter as proposed by Hirschmüller and Scharstein [PDF]. Optical flow results have been computed by the implementation of Black and Anandan's method as implemented by Deqing Sun and his own nonlocal method, both described in his paper with public Matlab code. Parameters used are available upon request.
The results we show are based on the respective algorithm output with standard parameters across all scenes. As the data is highly challenging and the parameters are not tuned, we cannot expect that the results are as good as they can get.
Please also note that these methods have not been developed for such difficult scenes - we merely used these algorithms as (to the best of our knowledge) better or specialized ones do not exist. Detailed information on the visualization method we used can be found here. You can also click on the images to see a fullsize view of the visualization.

Blinking Arrow Car Truck Crossing Cars Flying Snow Night and Snow Rain Blur
Rain Flares Reflecting Car Shadow on Truck Sunflare Wet Autobahn  

Data Generation

We recorded the data with a high-speed, high-resolution stereo camera system. A publication with motivation and a minute description of the recording procedure is to be published in Febuary 2012. Meanwhile, a full technical report can be found here.

Data Processing

We recorded the sequences with a resolution of 1312x1082@12bit and a framerate of 100Hz. The baseline distance was around 30cm. All on-camera-preprocessing was turned off.
The following operations have been performed on the images:

  • We computed a radiometric calibration using an Ulbricht-Sphere (integrating sphere) and estimated a quadratic response curve for each pixel (two parameters).
  • Before each recording session we estimated the dark current by averaging 100 frames with a closed lens cap.
  • Before each recording session both cameras were calibrated based on a method similar to http://www.ipb.uni-bonn.de/uploads/tx_ikgpublication/hau97.calibration.pdf. This means that we have removed spherical distortions and aligned the images with respect to their epipolar geometry (horizontal lines in the first image correspond to the same horizontal lines in the second image). The camera parameters are available as extra download below
  • The dark current image was subtracted from each camera frame and the inverse of each response curve was used to linearize the intensities. Hot and defect pixels were identified based on a heuristic (significant local intensity deviations throughout all sequences) and removed by applying a 3x3 median filter.
  • Each frame (both left and right) was rectified based on a lookup-table which was created with the camera calibration result.
  • The resulting (both radiometrically and geometrically rectified) image pair is downscaled to the size of 656x541@12bit by averaging each four pixels to a single pixel. We used this simple method explicitly to conserve the natural noise on the images as much as possible. Every fourth image (to reach an effective framerate of 25Hz) is stored as PGM files.
  • Saturated pixels are not processed.

Privacy Protection

The recorded data shows challenging traffic scenes as they occur in real life and therefore contains license plates and pedestrians. To protect the privacy of the traffic participants we removed all information that can be traced back to individuals. As we attach great importance to the protection of privacy and the compliance with national privacy laws we chose to accept this interference with the raw data.
To keep interference as minimal as possible, we manually labeled all pedestrians and license plates in the scenes and removed only the high frequency data that contains the private information by a Gaussian blur-filter. We chose a large variance to render inverse engineering of the original data impossible.
To prevent the introduction of spurious gradients at the transition between filtered and original regions we define a boundary region where filtered and unfilterted parts are blended together smoothly. With this procedure we ensure that neither new high frequency content nor new image gradients are introduced.
Practical comparison of optical flow and stereo correspondences estimated on both the original and the modified images show that the removal of private information does not interfere with the performance of the tested algorithms.
Along with the modified images we offer binary masks for download in which modified pixel are marked with a non-zero value.

System Configuration

Code

We provide different packages of C++ and MATLAB code that we used to produce some of the results on this page. When using this code please note the disclaimer provided with the software.

  • In this project we work with 12-bit grayscale .pgm-images. Unfortunatly, MATLAB's imread() function does not support 12-bit images. You can use this function as a replacement, maintaining syntax and options of imread().
  • Results on this page were displayed with our visualization code. To compare your results easily to the ones given here, you can use the same visualization methods, provided as MATLAB and C++ sources.

Usage of the Data

This data must be used for research purposes only.

In case you would like to publish work based on this data, please cite the following article:
@article{meister2012outdoor,
title={Outdoor stereo camera system for the generation of real-world benchmark data sets},
author={Meister, S. and J{\"a}hne, B. and Kondermann, D.},
journal={Optical Engineering},
volume={51},
number={02},
pages={021107},
year={2012}
}

Related Work

A well-known set of short image sequences and stereo pairs with ground truth has been made public via the Middlebury Benchmark Website which is being maintained by Prof. Daniel Scharstein, Middlebury College, USA.
The first dataset with long image sequences, partly augmented with ground truth for stereo as well as optical flow estimation was the .enpeda.-Database created by the group of Prof. Reinhard Klette in Auckland, New Zealand.
More recently, Andreas Geiger and colleagues in the groups of Prof. Christoph Stiller (Karlsruhe Institute of Technology) and Prof. Raquel Urtasun (Toyota Technological Institute, USA) published the KITTI Vision Benchmark Suite.
If you have published a similar dataset, please let us know, so we can include it in this article.

Acknowledgements

The present data was acquired and processed by Daniel Kondermann (HCI), Stephan Meister (HCI), Paul-Sebastian Lauer (Bosch Corporate Research) and Anita Sellent (Bosch Corporate Research) in collaboration with the Robert Bosch GmbH. The work was backed up by Bernd Jähne (HCI), Wolfgang Niehsen (Bosch Corporate Research) and Jochen Wingbermühle (Bosch Corporate Research). Furthermore, our HCI student research assistants Annika Berger, Julian Coordts, Tobias Preatsch and Christoph Koke spent countless hours to assist with the preparation of the data.

Dataset Download

Dataset containing 11 challenging sequences for stereo and optical flow estimation. Please fill out the form below and we will send you a download link.