Face image analysis by unsupervised learning and redundancy reduction
Marian Stewart Bartlett
Doctoral Dissertation
University of California, San Diego
1998
Abstract
In a task such as face recognition, much of the important information may
be contained in the high-order relationships among the image pixels.
Representations such as "Eigenfaces" (Turk & Pentland, 1991) and "Holons"
(Cottrell & Metcalfe, 1991) are based on Principal component analysis
(PCA), which encodes the correlational structure of the input, but
does not address high-order statistical dependencies such as relationships
among three or more pixels. Independent component analysis (ICA) is a
generalization of PCA which encodes the high-order dependencies in the
input in addition to the correlations. Representations for face recognition
were developed from the independent components of face images. The ICA
representations were superior to PCA for recognizing faces across sessions
and changes in expression.
ICA was compared to more than eight other image analysis methods on a task
of recognizing facial expressions in a project to automate the Facial
Action Coding System (Ekman & Friesen, 1978). These methods included
estimation of optical flow; representations based on the second-order
statistics of the full face images such Eigenfaces (Cottrell & Fleming, 1990;
Turk & Pentland, 1991) local feature analysis (Penev & Atick, 1996), and linear
discriminant analysis (Belhumeur, Hespanha, & Kriegeman, 1997); and
representations based on the outputs of local filters, such as a Gabor
wavelet representations (Daugman, 1988; Lades et al. 1993) and local PCA
(Padgett and Cottrell, 1997). The ICA and Gabor wavelet representations
achieved the best performance of 96% for classifying 12 facial actions.
Relationships between the independent component representation and the
Gabor representation are discussed.
Temporal redundancy contains information for learning
invariances. Different views of a face tend to appear in close temporal
proximity as the person changes expression, pose, or moves through the
environment. The final chapter modeled the development of viewpoint
invariant responses to faces from visual experience in a biological system
by encoding spatio-temporal dependencies. The simulations combined
temporal smoothing of activity signals with Hebbian learning (Foldiak,
1991) in a network with both feed-forward connections and a recurrent layer
that was a generalization of a Hopfield attractor network. Following
training on sequences of graylevel images of faces as they changed pose,
multiple views of a given face fell into the same basin of attraction, and
the system acquired representations of faces that were approximately
viewpoint invariant.
Number of hits since July 10, 1998: