Micro Perceptual Human Computation for Visual Tasks

Yotam Gingold; Ariel Shamir; Daniel Cohen-Or

“Micro Perceptual Human Computation for Visual Tasks” by Gingold, Shamir and Cohen-Or

Next: “Micro sized art “the weight of... »

« Previous: “Micro Archiving” by Saito

Conference:

SIGGRAPH 2012

Type(s):

Technical Papers

Title:

Micro Perceptual Human Computation for Visual Tasks

Presenter(s)/Author(s):

Yotam Gingold

Ariel Shamir

Daniel Cohen-Or

Abstract:

Human Computation (HC) utilizes humans to solve problems or carry out tasks that are hard for pure computational algorithms. Many graphics and vision problems have such tasks. Previous HC approaches mainly focus on generating data in batch, to gather benchmarks, or perform surveys demanding nontrivial interactions. We advocate a tighter integration of human computation into online, interactive algorithms. We aim to distill the differences between humans and computers and maximize the advantages of both in one algorithm. Our key idea is to decompose such a problem into a massive number of very simple, carefully designed, human micro-tasks that are based on perception, and whose answers can be combined algorithmically to solve the original problem. Our approach is inspired by previous work on micro-tasks and perception experiments. We present three specific examples for the design of micro perceptual human computation algorithms to extract depth layers and image normals from a single photograph, and to augment an image with high-level semantic information such as symmetry.

References:

Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., and Susstrunk, S. 2010. Superpixels. Tech. rep., EPFL.Google Scholar
Adar, E. 2011. Why I hate Mechanical Turk research. In Proceedings of the CHI’ Workshop on Crowdsourcing and Human Computation.Google Scholar
Adomavicius, G. and Tuzhilin, A. 2005. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. Trans. Knowl. Data Engin. 17, 734–749. Google ScholarDigital Library
Ahn, L. V., Blum, M., Hopper, N. J., and Langford, J. 2003. CAPTCHA: Using hard AI problems for security. In Proceedings of the Conference on Advances in Cryptology (Eurocrypt). 294–311. Google ScholarDigital Library
Amazon. 2005. Mechanical turk. http://www.mturk.com/.Google Scholar
Amer, M., Raich, R., and Todorovic, S. 2010. Monocular extraction of 2.1D sketch. In Proceedings of the International Conference on Image Processing (ICIP). 3437–3440.Google Scholar
Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., and Zaharia, M. 2010. A view of cloud computing. Comm. ACM 53, 50–58. Google ScholarDigital Library
Assa, J. and Wolf, I. 2007. Diorama construction from a single image. In Proceedings of the Eurographics Conference. Eurographics Association.Google Scholar
Belhumeur, P. N., Kriegman, D. J., and Yuille, A. L. 1997. The bas-relief ambiguity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1060–1066. Google ScholarDigital Library
Bernstein, M. S., Brandt, J., Miller, R. C., and Karger, D. R. 2011. Crowds in two seconds: Enabling real-time crowd-powered interfaces. In Proceedings of the Annual ACM Symposium on User Interface Software and Technology (UIST). 32–42. Google ScholarDigital Library
Bernstein, M. S., Little, G., Miller, R. C., Hartmann, B., Ackerman, M. S., Karger, D. R., Crowell, D., and Panovich, K. 2010. Soylent: A word processor with a crowd inside. In Proceedings of the Annual ACM Symposium on User Interface Software and Technology (UIST). 313–322. Google ScholarDigital Library
Bhat, P., Zitnick, C. L., Cohen, M., and Curless, B. 2010. GradientShop: A gradient-domain optimization framework for image and video filtering. ACM Trans. Graph. 29, 10:1–10:14. Google ScholarDigital Library
Bigham, J. P., Jayant, C., Ji, H., Little, G., Miller, A., Miller, R. C., Miller, R., Tatarowicz, A., White, B., White, S., and Yeh, T. 2010. VizWiz: Nearly real-time answers to visual questions. In Proceedings of the Annual ACM Symposium on User Interface Software and Technology (UIST). 333–342. Google ScholarDigital Library
Branson, S., Wah, C., Babenko, B., Schroff, F., Welinder, P., Perona, P., and Belongie, S. 2010. Visual recognition with humans in the loop. In Proceedings of the European Conference on Computer Vision (ECCV). Google ScholarDigital Library
Chen, P.-C., Hays, J. H., Lee, S., Park, M., and Liu, Y. 2007. A quantitative evaluation of symmetry detection algorithms. Tech. rep. CMU-RI-TR-07-36, Robotics Institute, Pittsburgh, PA.Google Scholar
Chen, X., Golovinskiy, A., and Funkhouser, T. 2009. A benchmark for 3D mesh segmentation. ACM Trans. Graph. 28, 3. Google ScholarDigital Library
Chilton, L. B., Horton, J. J., Miller, R. C., and Azenkot, S. 2010. Task search in a human computation market. In Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP). 1–9. Google ScholarDigital Library
Cole, F., Sanik, K., DeCarlo, D., Finkelstein, A., Funkhouser, T., Rusinkiewicz, S., and Singh, M. 2009. How well do line drawings depict shape? ACM Trans. Graph. 28, 3. Google ScholarDigital Library
Comaniciu, D. and Meer, P. 2002. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24, 5, 603–619. Google ScholarDigital Library
Cornelius, H., Perd’och, M., Matas, J., and Loy, G. 2007. Efficient symmetry detection using local affine frames. In Proceedings of the Scandinavian Conference on Image Analysis (SCIA). 152–161. Google ScholarDigital Library
CrowdFlower. 2007. Crowdflower. http://crowdflower.com/.Google Scholar
Durou, J.-D., Falcone, M., and Sagona, M. 2008. Numerical methods for shape-from-shading: A new survey with benchmarks. Comput. Vis. Image Understand. 109, 22–43. Google ScholarDigital Library
Faridani, S., Hartmann, B., and Ipeirotis, P. 2011. What’s the right price? Pricing tasks for finishing on time. In Proceedings of the AAAI Workshop on Human Computation (HCOMP).Google Scholar
Goldberg, D., Nichols, D., Oki, B. M., and Terry, D. 1992. Using collaborative filtering to weave an information tapestry. Comm. ACM 35, 61–70. Google ScholarDigital Library
Grier, D. A. 2005. When Computers Were Human. Princeton University Press. Google ScholarDigital Library
Hayes, B. 2008. Cloud computing. Comm. ACM 51, 7, 9–11. Google ScholarDigital Library
Healy, A. F., Proctor, R. W., and Weiner, I. B., Eds. 2003. Experimental Psychology. Handbook of Psychology. Vol. 4. Wiley.Google Scholar
Heer, J. and Bostock, M. 2010. Crowdsourcing graphical perception: Using mechanical turk to assess visualization design. In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI). 203–212. Google ScholarDigital Library
Hoiem, D., Efros, A. A., and Hebert, M. 2005. Automatic photo pop-up. http://www.cs.uiuc.edu/homes/dhoiem/projects/popup/. Google ScholarDigital Library
Huang, E., Zhang, H., Parkes, D. C., Gajos, K. Z., and Chen, Y. 2010. Toward automatic task design: A progress report. In Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP). Google ScholarDigital Library
Ipeirotis, P. G. 2010. Analyzing the amazon mechanical turk marketplace. ACM Crossroads 17, 16–21. Google ScholarDigital Library
Ipeirotis, P. G., Provost, F., and Wang, J. 2010. Quality management on amazon mechanical turk. In Proceedings of the ACM SIGKDD Workshop on Human Computation (HCOMP). Google ScholarDigital Library
Kalogerakis, E., Hertzmann, A., and Singh, K. 2010. Learning 3D mesh segmentation and labeling. ACM Trans. Graph. 29, 3. Google ScholarDigital Library
Koenderink, J. J., van Doorn, A. J., and Kappers, A. M. L. 1992. Surface perception in pictures. Percept. Psycophys. 52, 5, 487–496.Google ScholarCross Ref
Koenderink, J. J., van Doorn, A. J., Kappers, A. M. L., and Todd, J. T. 2001. Ambiguity and the ‘mental eye’ in pictorial relief. Percept. 30, 431–448.Google ScholarCross Ref
Levinshtein, A., Stere, A., Kutulakos, K. N., Fleet, D. J., Dickinson, S. J., and Siddiqi, K. 2009. TurboPixels: Fast superpixels using geometric flows. IEEE Trans. Pattern Anal. Mach. Intell. 31, 2290–2297. Google ScholarDigital Library
Little, G., Chilton, L. B., Goldman, M., and Miller, R. C. 2010. TurKit: Human computation algorithms on Mechanical Turk. In Proceedings of the Annual ACM Symposium on User Interface Software and Technology (UIST). Google ScholarDigital Library
Liu, Y., Hel-Or, H., Kaplan, C. S., and Gool, L. V. 2010. Computational symmetry in computer vision and computer graphics. Found. Trends Comput. Graph. Vis. 5, 1–195.Google ScholarCross Ref
Mason, W. and Suri, S. 2011. Conducting behavioral research on amazon’s mechanical turk. Behav. Res. Methods 44, 1.Google ScholarCross Ref
Mason, W. and Watts, D. J. 2010. Financial incentives and the “performance of crowds”. SIGKDD Explor. Newslett. 11, 100–108. Google ScholarDigital Library
Oh, B. M., Chen, M., Dorsey, J., and Durand, F. 2001. Image-Based modeling and photo editing. In Proceedings of the ACM SIGGRAPH Conference. 433–442. Google ScholarDigital Library
Quinn, A. J. and Bederson, B. B. 2011. Human computation: A survey and taxonomy of a growing field. In Proceedings of the ACM SIGCHI Conference. 1403–1412. Google ScholarDigital Library
Russel, B. C., Torralba, A., Murphy, K. P., and Freeman, W. T. 2008. LabelMe: A database and Web-based tool for image annotation. Int. J. Comput. Vis. 77, 1–3, 157-173. Google ScholarDigital Library
Samasource. 2008. Samasource. http://www.samasource.org/.Google Scholar
Saxena, A., Sun, M., and Ng, A. Y. 2009. Make3D: Learning 3D scene structure from a single still image. IEEE Trans. Pattern Anal. Mach. Intell. 31, 824–840. Google ScholarDigital Library
Schmidt, R., Khan, A., Kurtenbach, G., and Singh, K. 2009. On expert performance in 3D curve-drawing tasks. In Proceedings of the Eurographics Workshop on Sketch-Based Interfaces and Modeling (SBIM). 133–140. Google ScholarDigital Library
Shahaf, D. and Horvitz, E. 2010. Generalized task markets for human and machine computation. In Proceedings of the National Conference on Artificial Intelligence.Google Scholar
Sorokin, A., Berenson, D., Srinivasa, S., and Hebert, M. 2010. People helping robots helping people: Crowdsourcing for grasping novel objects. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).Google Scholar
Spiro, I., Taylor, G., Williams, G., and Bregler, C. 2010. Hands by hand: Crowd-Sourced motion tracking for gesture annotation. In Proceedings of the Computer Vision and Pattern Recognition Workshops (CVPRW). 17–24.Google Scholar
Sykora, D., Sedlacek, D., Jinchao, S., Dingliana, J., and Collins, S. 2010. Adding depth to cartoons using sparse depth (in)equalities. Comput. Graph. Forum 29, 2.Google ScholarCross Ref
Talton, J. O., Gibson, D., Yang, L., Hanrahan, P., and Koltun, V. 2009. Exploratory modeling with collaborative design spaces. ACM Trans. Graph. 28, 167:1–167:10. Google ScholarDigital Library
Txteagle. 2009. Txteagle. http://txteagle.com/.Google Scholar
Ventura, J., DiVerdi, S., and Hollerer, T. 2009. A sketch-based interface for photo pop-up. In Proceedings of the Eurographics Workshop on Sketch-Based Interfaces and Modeling (SBIM). Google ScholarDigital Library
von Ahn, L. 2005. Human computation. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA. Google ScholarDigital Library
von Ahn, L. and Dabbish, L. 2004. Labeling images with a computer game. In Proceedings of the ACM SIGCHI Conference. 319–326. Google ScholarDigital Library
von Ahn, L. and Dabbish, L. 2008. General techniques for designing games with a purpose. Comm. ACM 51, 8, 58–67. Google ScholarDigital Library
Wu, T.-P., Sun, J., Tang, C.-K., and Shum, H.-Y. 2008. Interactive normal reconstruction from a single image. ACM Trans. Graph. 27, 119:1–119:9. Google ScholarDigital Library
Yuen, J., Russell, B. C., Liu, C., and Torralba, A. 2009. LabelMe video: Building a video database with human annotations. In Proceedings of the IEEE 12th International Conference on Computer Vision (ICCV). 1451–1458.Google Scholar

ACM Digital Library Publication: