“Learning to be a depth camera for close-range human capture and interaction” by Fanello, Keskin, Izadi, Kohli, Shotton, et al. …

  • ©Sean Ryan Fanello, Cem Keskin, Shahram Izadi, Pushmeet Kohli, Jamie Shotton, Antonio Criminisi, David Kim, David Sweeney, and Sing Bing Kang



Session Title:

    Computational Sensing & Display


    Learning to be a depth camera for close-range human capture and interaction




    We present a machine learning technique for estimating absolute, per-pixel depth using any conventional monocular 2D camera, with minor hardware modifications. Our approach targets close-range human capture and interaction where dense 3D estimation of hands and faces is desired. We use hybrid classification-regression forests to learn how to map from near infrared intensity images to absolute, metric depth in real-time. We demonstrate a variety of human-computer interaction and capture scenarios. Experiments show an accuracy that outperforms a conventional light fall-off baseline, and is comparable to high-quality consumer depth cameras, but with a dramatically reduced cost, power consumption, and form-factor.


    1. Ahmed, A. H., and Farag, A. A. 2007. Shape from shading under various imaging conditions. In Proc. CVPR, IEEE, 1–8.Google Scholar
    2. Amit, Y., and Geman, D. 1997. Shape quantization and recognition with randomized trees. Neural Computation 9, 7. Google ScholarDigital Library
    3. Barron, J. T., and Malik, J. 2013. Shape, illumination, and reflectance from shading. Tech. Rep. UCB/EECS-2013-117, EECS, UC Berkeley, May.Google Scholar
    4. Batlle, J., Mouaddib, E., and Salvi, J. 1998. Recent progress in coded structured light as a technique to solve the correspondence problem: a survey. Pattern Recognition 31, 7, 963–982.Google ScholarCross Ref
    5. Ben-Arie, J., and Nandy, D. 1998. A neural network approach for reconstructing surface shape from shading. In In Proc. ICIP 98., vol. 2, IEEE, 972–976.Google Scholar
    6. Besl, P. J. 1988. Active, optical range imaging sensors. Machine vision and applications 1, 2, 127–152. Google ScholarDigital Library
    7. Blais, F. 2004. Review of 20 years of range sensor development. Journal of Electronic Imaging 13, 1.Google ScholarCross Ref
    8. Blanz, V., and Vetter, T. 1999. A morphable model for the synthesis of 3D faces. Proc. ACM SIGGRAPH. Google ScholarDigital Library
    9. Breiman, L. 2001. Random forests. Machine Learning 45, 1. Google ScholarDigital Library
    10. Brown, M. Z., Burschka, D., and Hager, G. D. 2003. Advances in computational stereo. PAMI 25, 8, 993–1008. Google ScholarDigital Library
    11. Comaniciu, D., and Meer, P. 2002. Mean shift: A robust approach toward feature space analysis. IEEE Trans. PAMI 24, 5 Google ScholarDigital Library
    12. Criminisi, A., and Shotton, J. 2013. Decision Forests for Computer Vision and Medical Image Analysis. Springer. Google ScholarDigital Library
    13. Fredembach, C., and Susstrunk, S. 2008. Colouring the near-infrared. In Color and Imaging Conference, vol. 2008, Society for Imaging Science and Technology, 176–182.Google Scholar
    14. Ghosh, A., Fyffe, G., Tunwattanapong, B., Busch, J., Yu, X., and Debevec, P. 2011. Multiview face capture using polarized spherical gradient illumination. ACM Transactions on Graphics (TOG) 30, 6, 129. Google ScholarDigital Library
    15. Girshick, R., Shotton, J., Kohli, P., Criminisi, A., and Fitzgibbon, A. 2011. Efficient regression of general-activity human poses from depth images. In Proc. ICCV. Google ScholarDigital Library
    16. Guan, P., Weiss, A., Balan, A., and Black, M. 2009. Estimating human shape and pose from a single image. In Proc. ICCV.Google Scholar
    17. Gurbuz, S. 2009. Application of inverse square law for 3d sensing. In SPIE Optical Engineering+ Applications, International Society for Optics and Photonics, 744706–744706.Google Scholar
    18. Hernández, C., Vogiatzis, G., and Cipolla, R. 2008. Multiview photometric stereo. IEEE Trans. PAMI 30, 3, 548–554. Google ScholarDigital Library
    19. Hertzmann, A., and Seitz, S. 2005. Example-based photometric stereo: Shape reconstruction with general, varying BRDFs. PAMI 27, 8. Google ScholarDigital Library
    20. Hoiem, D., Efros, A., and Hebert, M. 2005. Automatic photo pop-up. In Proc. ACM SIGGRAPH. Google ScholarDigital Library
    21. Horn, B. K. 1975. Obtaining shape from shading information. The psychology of computer vision, 115–155.Google Scholar
    22. Ideses, I., Yaroslavsky, L., and Fishbain, B. 2007. Real-time 2D to 3D video conversion. J. of Real-Time Image Processing 2, 3–9.Google ScholarCross Ref
    23. Jiang, T., Liu, B., Lu, Y., and Evans, D. 2003. A neural network approach to shape from shading. International journal of computer mathematics 80, 4, 433–439.Google Scholar
    24. Karsch, K., Liu, C., and Kang, S. 2012. Depth extraction from video using non-parametric sampling. In Proc. ECCV. Google ScholarDigital Library
    25. Keskin, C., Kiraç, F., Kara, Y., and Akarun, L. 2012. Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In Proc. ECCV. Google ScholarDigital Library
    26. Khan, N., Tran, L., and Tappen, M. 2009. Training many-parameter shape-from-shading models using a surface database. In Proc. ICCV Workshop.Google Scholar
    27. Kim, D., Hilliges, O., Izadi, S., Butler, A. D., Chen, J., Oikonomidis, I., and Olivier, P. 2012. Digits: freehand 3d interactions anywhere using a wrist-worn gloveless sensor. In Proceedings of the 25th annual ACM symposium on User interface software and technology, ACM, 167–176. Google ScholarDigital Library
    28. Krishnan, D., and Fergus, R. 2009. Dark flash photography. In ACM Transactions on Graphics, SIGGRAPH 2009 Conference Proceedings, vol. 28. Google ScholarDigital Library
    29. Lanman, D., and Taubin, G. 2009. Build your own 3D scanner: 3D photography for beginners. In ACM SIGGRAPH 2009 Courses, ACM, 8. Google ScholarDigital Library
    30. Liao, M., Wang, L., Yang, R., and Gong, M. 2007. Light fall-off stereo. In Proc. CVPR.Google Scholar
    31. Liu, C. P., Cheng, B. H., Chen, P. L., and Jeng, T. R. 2011. Study of three-dimensional sensing by using inverse square law. Magnetics, IEEE Transactions on 47, 3, 687–690.Google ScholarCross Ref
    32. Marschner, S. R., Westin, S. H., Lafortune, E. P., Torrance, K. E., and Greenberg, D. P. 1999. Image-based BRDF measurement including human skin. In Rendering Techniques 99. Springer, 131–144. Google ScholarDigital Library
    33. Mulligan, J., and Brolly, X. 2004. Surface determination by photometric ranging. In Proc. CVPR Workshop. Google ScholarDigital Library
    34. Newcombe, R. A., Izadi, S., et al. 2011. Kinect-fusion: Real-time dense surface mapping and tracking. In Mixed and augmented reality (ISMAR), 2011 10th IEEE international symposium on, IEEE, 127–136. Google ScholarDigital Library
    35. Prados, E., and Faugeras, O. 2005. Shape from shading: a well-posed problem? In Proc. CVPR, vol. 2. Google ScholarDigital Library
    36. Remondino, F., and Stoppa, D. 2013. ToF range-imaging cameras. Springer. Google ScholarDigital Library
    37. Rother, C., Kiefel, M., Zhang, L., Schölkopf, B., and Gehler, P. V. 2011. Recovering intrinsic images with a global sparsity prior on reflectance. In Proc. NIPS.Google Scholar
    38. Saxena, A., Sun, M., and Ng, A. 2009. Make3D: Learning 3D scene structure from a single still image. PAMI 31, 5, 824–840. Google ScholarDigital Library
    39. Scharstein, D., and Szeliski, R. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. In IJCV. Google ScholarDigital Library
    40. Shotton, J., Winn, J., Rother, C., and Criminisi, A. 2006. TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In Proc. ECCV. Google ScholarDigital Library
    41. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. 2011. Real-time human pose recognition in parts from single depth images. In Proc. CVPR. Google ScholarDigital Library
    42. Simpson, C. R., Kohl, M., Essenpreis, M., and Cope, M. 1998. Near-infrared optical properties of ex vivo human skin and subcutaneous tissues measured using the monte carlo inversion technique. Physics in Medicine and Biology 43, 2465–2478.Google ScholarCross Ref
    43. Smith, W. A., and Hancock, E. R. 2008. Facial shape-from-shading and recognition using principal geodesic analysis and robust statistics. International Journal of Computer Vision 76, 1, 71–91. Google ScholarDigital Library
    44. Tunwattanapong, B., Fyffe, G., Graham, P., Busch, J., Yu, X., Ghosh, A., and Debevec, P. 2013. Acquiring reflectance and shape from continuous spherical harmonic illumination. ACM Transactions on Graphics (TOG) 32, 4, 109. Google ScholarDigital Library
    45. Vineet, V., Rother, C., and Torr, P. 2013. Higher order priors for joint intrinsic image, objects, and attributes estimation. In Proc. NIPS, 557–565.Google Scholar
    46. Visentini-Scarzanella, M., Stoyanov, D., and Yang, G.-Z. 2012. Metric depth recovery from monocular images using shape-from-shading and specularities. In Image Processing (ICIP), 2012 19th IEEE International Conference on, IEEE, 25–28.Google Scholar
    47. Vogel, O., Breuss, M., Leichtweis, T., and Weickert, J. 2009. Fast shape from shading for Phong-type surfaces. In International Conf. Scale Space and Variational Methods. Google ScholarDigital Library
    48. Wang, X., and Yang, R. 2010. Learning 3D shape from a single facial image via non-linear manifold embedding and alignment. In Proc. CVPR.Google Scholar
    49. Wei, G.-Q., and Hirzinger, G. 1996. Learning shape from shading by a multilayer network. IEEE Transactions on Neural Networks 7, 4, 985–995. Google ScholarDigital Library
    50. Zhang, Z., Tsa, P.-S., Cryer, J. E., and Shah, M. 1999. Shape from shading: A survey. PAMI 21, 8, 690–706. Google ScholarDigital Library
    51. Zhang, Z. 2000. A flexible new technique for camera calibration. IEEE Trans. PAMI 22, 11, 1330–1334. Google ScholarDigital Library
    52. Zhang, S. 2010. Recent progresses on real-time 3d shape measurement using digital fringe projection techniques. Optics and lasers in engineering 48, 2, 149–158.Google Scholar

ACM Digital Library Publication: