“3D exploitation of 2D video from a hand-launched aerial glider” by Cho and Snavely

  • ©Peter Cho and Noah Snavely




    3D exploitation of 2D video from a hand-launched aerial glider



    Compact cameras and robotic platforms are growing ubiquitous as performance increases and price decreases for both of these transformational technologies. To encourage rapid experimentation with imaging sensors and robots, MIT Lincoln Laboratory held a Technology Challenge in September 2010 which involved remotely characterizing a 1 km2 rural area. In this poster, we present video exploitation results from an aerial glider system fielded as part of the Technology Challenge. The aerial system’s hardware setup included a Radian sailplane glider (< $400), a Canon powershot camera (< $300) and a Garmin GPS unit (< $100). The camera and GPS clocks were synchronized by taking pictures of the latter with the former. Both sensors were subsequently mounted to the glider’s underside prior to hand launching. Video imagery was collected at 3 Hz and GPS readings at 1 Hz over 20 minute aerial missions that flew up to 430 meters above ground. After 3000 aerial video frames had been collected, they were processed via the 3D image reconstruction pipeline developed by Snavely et al. SIFT features are first extracted and matched across all images on a parallelized computer Grid. Skeletal graph analysis and incremental bundle adjustment subsequently recover camera parameters and relative scene point positions. Multi-view stereo algorithms developed by Furukawa et al subsequently convert the initially sparse point cloud output into a dense reconstruction for the ground scene. By fitting the camera’s reconstructed path with GPS measurements, we derive the global transformation needed to convert both extrinsic camera parameters and the dense point cloud from relative to absolute world coordinates. We demonstrate several useful examples of geometry-based exploitation of reconstructed aerial video frames which are difficult to perform via conventional image processing. Firstly, our imaging hardware and reconstruction plus georegistration software yield detailed 3D terrain maps with absolute altitudes above sea-level for ground, water and trees. The maps’ approximate 1 meter Ground Sampling Distance is comparable to that from active-illumination ladars which cost orders of magnitude more than our inexpensive passive system. Secondly, stabilization and orthorectification of video frames from small air vehicles which experience significant jostling is readily achieved by projecting onto a common Z-plane. 3D-based video stabilization results are far superior to those from 2D warping. Finally, we illustrate how any interesting pixel in one video frame can be identified with counterpart pixels in other video frames via 3D backprojection and reprojection. These geometrical relationships enable automatic annotation of aerial video footage in the future.

ACM Digital Library Publication:

Overview Page: