CNL : Lip-reading using ICA

<< Return to CNL ICA Page

What is the appropriate spatial scale for image representation? In the primate visual system, receptive fields are small at early stages of processing (area V1), and larger at late stages of processing (areas MT, IT). In the current work, we explore the efficiency of local and global image representations on an automatic visual speech recognition task using an HMM as the recognition system. We compare local and global principal component and independent component image representations for the task. Local representations consistently and significantly outperformed global representations in terms of generalization to new speakers.

Gray, M.S., Movellan, J.R., and Sejnowski, T. J. (1997). A comparison of local versus global image decompositions for visual speechreading. Proceedings of the 4th Annual Jount Symposium on Neural Computation, Pasadena, CA, May 17, 1997. [pdf]