Learning Viewpoint Invariant Face Representations from Visual
Experience in an Attractor Network
Marian Stewart Bartlett and Terrence J. Sejnowski
Network: Computation in Neural Systems 9(3) 1-19, 1998.
Abstract
In natural visual experience, different views of an object or face tend to
appear in close temporal proximity as an animal manipulates the object or
navigates around it, or as a face changes expression or pose. A set of
simulations is presented which demonstrate how viewpoint invariant
representations of faces can be developed from visual experience by
capturing the temporal relationships among the input patterns. The
simulations explored the interaction of temporal smoothing of activity
signals with Hebbian learning Foldiak (1991) in both a feedforward layer
and a second, recurrent layer of a network. The feedforward connections
were trained by Competitive Hebbian Learning with temporal smoothing of the
post-synaptic unit activities (Bartlett & Sejnowski, 1996). The recurrent
layer was a generalization of a Hopfield network with a lowpass temporal
filter on all unit activities. The combination of basic Hebbian learning
with temporal smoothing of unit activities produced an attractor network
learning rule that associated temporally proximal input patterns into
basins of attraction. These two mechanisms were demonstrated in a model
that took graylevel images of faces as input. Following training on image
sequences of faces as they changed pose, multiple views of a given face
fell into the same basin of attraction, and the system acquired
representations of faces that were approximately viewpoint invariant.