Learning Viewpoint Invariant Representations of Faces in an Attractor Network

Marian Stewart Bartlett and Terrence J. Sejnowski

Paper presented at the 1st Annual Meeting of the University of California Multicampus Research Group in Vision Modeling, UC Irvine, October 16, 1996.

Abstract
In natural visual experience, different views of an object or face tend to appear in close temporal proximity as an animal manipulates the object or navigates around it, or as a face changes expression or pose. One way to learn to recognize objects despite changes in viewpoint would be to learn to associate patterns that occur close together in time. Capturing the temporal relationships among patterns is a way to automatically associate different views of an object without requiring three dimensional structural descriptions. We present a set of simulations demonstrating how viewpoint invariant representations can be developed from visual experience with unsupervised learning by capturing these kinds of temporal relationships among the input patterns. We explored two mechanisms for developing viewpoint invariant representations of graylevel images of faces: 1. Competitive Hebbian learning of feedforward connections with a lowpass temporal filter on the activity of the post-synaptic unit (Bartlett & Sejnowski, 1996; Foldiak, 1991). 2. An attractor network that combines Hebbian learning with a lowpass temporal filter on unit activities. When the input patterns to an attractor network are passed through a lowpass temporal filter, then a basic Hebbian weight update rule takes a form related to Griniasty, Tsodyks & Amit (1993), which associates temporally proximal input patterns into basins of attraction. We implement these two mechanisms in a model with both feedforward and lateral components. Following training on sequences of graylevel images of faces as they change pose, multiple views of a given face fall into the same basin of attraction, and the system acquires representations of faces that are largely independent of pose.