“Implicit and explicit visual learning as it relates to machine vision systems” by Garaas, Xiao and Pomplun

  • ©Tyler Garaas, Mei Xiao, and Marc Pomplun




    Implicit and explicit visual learning as it relates to machine vision systems



    Recent trends in developing computer and robotic vision systems are borrowing from research into biological vision systems. Reasoning for such methods stems from the fact that evolution has solved many of the problems that current artificial vision system designers are now facing.One such area that designers have borrowed from nature is that of attention. Attention is being employed successfully in many different forms to ease the cost of computation on input data received in machine vision systems; many times modeled after ideas borrowed from what we have learned about the visual attention system employed by humans.Another area in the human vision system that is being examined is the ability to build a symbolic representation of the visual-environment from scratch; such ability introduces a myriad of benefits and opens many new possibilities. All humans that exist are or were, at one time, babies that had no internal representation of line, color, or form. This was learned by observing, slowly, the world around us and later associated with linguistic representations to interrelate. Many now believe that any robust machine vision system will need to develop in a similar manner.Attention combined with learning can provide a powerful tool in creating machine vision systems. Past research has shown, however, that much of the learning that occurs in humans comes from unattended stimulus, referred to as implicit learning. Consequently, there is the possibility for large gains for machine vision systems that exploit implicit learning.The experiment presented hereafter aims to examine the importance of explicit learning as compared to implicit learning. In addition, it will look at how attention changes as users become conditioned to an environment, with the ultimate goal of applying what is learned towards the design of a machine vision system that not only builds its own representation of the visual-environment, but does so with data that is both attended and unattended.Participants are divided into two groups. Group 1 is used to measure the performance gains from implicit learning and conversely, Group 2 is used to measure the performance gains from explicit learning. Visual attention is recorded using the Eye-Link II eye tracker. Simply, it is a piece of head-gear fixed with cameras to pinpoint gaze-position. 3D gaze-position is determined using a neural network calibration program.The task is quite simple, and one that many researchers are familiar with. Each participant is placed in a sufficiently large maze that is displayed to them using an OpenGL program on an autostereoscopic monitor. The goal of the participant is to find the cheese that is hidden somewhere within the maze. The maze is quite similar in design to the mazes used to measure learning in rodents. However, the walls are textured using a light-red and dark-red brick pattern. The exact same maze is used for every trial in the experiment, but there are twenty versions of the task; ten for Group 1 and ten for Group 2. Group 1 participants are placed in the same location with ten different brick patterns; each brick pattern corresponds one-to-one to exactly ten different cheese-locations. Group 2 participants are placed in ten different locations using one brick pattern; each starting location corresponds one-to-one to exactly ten different cheese-locations. Each participant participates in two epochs a day for five consecutive days. An epoch consists of running through each of their ten versions of the maze presented in a pseudo-random order between epochs.Collected data will be in the form of measured running-time for each maze as well as gaze positions through the running of the maze collected using the Eye-Link II. At the end of the experiment, participants will be given a questionnaire to fill out. One of the questions will be aimed at determining any strategies employed to figure out the location of the cheese.Performance gains for participants in group 1 can be attributed to the implicit learning of the brick pattern to cheese-location association; assuming they did not figure it out and post it on the questionnaire. Whereas performance gains for participants in group 2 can be attributed to explicit learning. Due to the fact that it is the actual decisions made by participants that keyed the cheese-location, which must always be attended to.Aside from the goal of dividing benefits between implicit and explicit learning, the experiment aims to track differences in visual attention as participants become familiar with the tasks and environment. This also has possible important repercussions for an attended, learning machine vision system, by giving a new model for what exactly the system should be attending to.Very preliminary results suggest learning by both groups of participants, but it is too early to estimate to what extent learning occurs.


    1. Essig, K., Pomplun, M. & Ritter, H. (2006) A Neural Network for 3D Gaze Recording with Binocular Eye Trackers. <International Journal of Parallel, Emergent, and Distributed Systems. 21 (2), 79–95
    2. Jiang, Y. & Leung., A. (2005) Implicit Learning of Ignored Visual Context. <Psychonomic Bulletin & Review<. 12 (1), 100–106

ACM Digital Library Publication:

Overview Page: