End-to-end depth from motion with stabilized monocular videos

Clément Pinard1,2, Laure Chevalley2, Antoine Manzanera1, David Filliat1

1ENSTA, Computer Science and System Engineering Department


We propose a depth map inference system from monocular videos based on a novel dataset for navigation that mimics aerial footage from gimbal stabilized monocular camera in rigid scenes. Unlike most navigation datasets, the lack of rotation implies an easier structure from motion problem which can be leveraged for different kinds of tasks such as depth inference and obstacle avoidance. We also propose an architecture for end-to-end depth inference with a fully convolutional network. Results show that although tied to camera inner parameters, the problem is locally solvable and leads to good quality depth prediction.

we also propose a multi-range architecture for unconstrained UAV flight, leveraging flight data from sensors to make accurate depth maps for uncluttered outdoor environment. We try our algorithm on both synthetic scenes and real UAV flight data. Quantitative results are given for synthetic scenes with a slightly noisy orientation, and show that our multi-range architecture improves depth inference.

End-to-end depth from motion with stabilized monocular videos

In UAVg 2017 (Oral)

Multi range Real-time depth inference from a monocular stabilized footage using a Fully Convolutional Neural Network

In ECMR 2017 (Poster)

A code to replicate First paper's results is avalaible on GitHub

GitHub Repository   Github Stars Github Forks

The Still Box Dataset consists in 4 different image sizes. Here is a brief recap of sizes.

Image Size number of scenes total size (GB) compressed size (GB)
64x64 80K 19 9.8
128x128 16K 12 7.1
256x256 3.2K 8.5 5
512x512 3.2K 33 19
README Gist Torrent Link Magnet Link Academic Torrents Page

For more information about this dataset, please refer to our first paper