At a meeting held on April 13-16, 1986 in Snowbird Utah on Neural Networks for Computing Jeanny Herault and Christian Jutten (Herault and Jutten, 1986) contributed a research paper entitled "Space or time adaptive signal processing by neural network models". They presented a recurrent neural network model and a learning algorithm based on a version of the Hebb learning rule that, they claimed, was able to blindly separate mixtures of independent signals. They demonstrated the separation of two mixed signals and also mentioned the possibility of unmixing stereoscopic visual signals with four mixtures. This paper opens a remarkable chapter in the history of signal processing, a chapter that is hardly more than 10 years old.
The problem of source separation is an old one in electrical engineering and has been well studied; many algorithms exist depending on the nature of the mixed signals. The problem of blind source separation is more difficult since without knowledge of the signals that have been mixed, it is not possible to design appropriate preprocessing to optimally separate them. The only assumption made by Herault and Jutten was independence, but additional constraints are needed on the probability distribution of the sources. If one assumes, as is often done, that the source signals are Gaussian, then it is easy to show that this problem has no general solution. Subsequent research has shown that the best performance was obtained by the Herault-Jutten network when the source signals were sub-Gaussian (Cohen et al, 1992); that is, for signals whose kurtosis was less than that of a Gaussian distribution.
In the neural network field, this network model was overshadowed at the time by the more popular Hopfield network, which would soon be eclipsed in popularity by the backpropagation algorithm for multilayer perceptrons. Nonetheless, a line of research was begun that only gradually made clear the true nature of the problem. As is often the case, what is important is not the specifics of the algorithm, but the way the problem is formulated. The general framework for independent component analysis introduced by Herault and Jutten is most clearly stated in (Comon, 1994). Within the signal processing community, a cornucopia of ever more sophisticated algorithms was developed based on cumulants, generalizing the third-order nonlinearity first used by Herault and Jutten.
By 1994 the forefront of the neural network field had moved from supervised learning algorithms to unsupervised learning. A fast and efficient ICA algorithm was needed that could scale up with the size of the problem at least as well as backpropagation, which by this time was being used on networks with over a million weights. Tony Bell in my laboratory was working on an infomax (Linsker, 1992) approach to ICA. Tony's first results were obtained using Mathematica and a version of his algorithm that depended on inverting a matrix (Bell and Sejnowski, 1995). This was probably fortunate since the long pauses during convergence gave him ample time to think about the problem and to benefit from vigorous discussions with Nicol Schraudolph and Paul Viola, who at the time were sharing an office with a wonderful view of the Pacific Ocean. Both Nici and Paul were working on problems that involved estimating entropy gradients, so there was a keen competition to see whose algorithm would perform best. In 1996, Tony collaborated by long-distance with Te-Won Lee, who at the time was visiting Carnegie- Mellon University, on blind source separation of acoustically recorded sound mixtures, taking into account time delays.
Amari (1997) soon realized that the infomax ICA algorithm could be improved by using the natural gradient, which multiplies the gradient of the feedforward weight matrix W by a positive definite matrix W^T W, and speeds up the convergence by eliminating the matrix inversion. This improvement, which was independently discovered by (Cardoso, 1996), allows infomax ICA to be scaled up and makes it a practical algorithm for a variety of real-world problems. However, the original infomax ICA algorithm with sigmoidal nonlinearities was only suitable for super-Gaussian sources. Te-Won Lee realized that a key to generalizing the infomax algorithm to arbitrary non-Gaussian sources was to estimate moments of the source signals and to switch the algorithm appropriately. In collaboration with Mark Girolami, who had been working on similar algorithms in the context of projection pursuit, he soon developed an efficient extended version of the infomax ICA algorithm (Lee, Girolami and Sejnowski, 1998) that is suitable for general non-Gaussian signals.
Several different approaches have been taken to blind source separation, which include maximum likelihood, Bussgang methods based on cumulants, projection pursuit and negentropy methods. All of these are all closely related to the infomax framework (Lee, Girolami, Bell and Sejnowski, 1998). Thus, a large number of researchers who have attacked ICA from a variety of different directions are converging on a common set of principles and, ultimately, a well understood class of algorithms. There is still much work that is left to do. It is still true as Herault and Jutten mention in their 1986 paper, "We cannot prove convergence of this algorithm because of nonlinearity of the adaptation law and nonstationarity of the signals." We still do not have an adequate explanation for why ICA does converge for so many problems, almost always to the same solutions, even when the signals were not derived from independent sources.
Although the blind separation of mixtures of prerecorded signals is a useful benchmark, a more challenging problem is to apply ICA to recordings of real-world signals for which the underlying sources, if any, are unknown. An important example is the application of extended infomax ICA to electroencephalographic (EEG) recordings of scalp potentials in humans. The electrical signals originating from the brain are quite weak at the scalp, in the microvolt range, and there are larger artifactual components arising from eye movements and muscles. It has been a difficult challenge to eliminate these artifacts without altering the brain signals. ICA is ideally suited to this task, since the brain and the scalp are good volume conductors and to a good approximation, the recordings are different linear mixtures of the brain signals and the artifacts. The extended infomax ICA algorithm has proven to be the best method yet for separating out these artifacts, which include sub-Gaussian sources such as 60 Hz line noise and blinks, from the brain signals, which are generally super-Gaussian (Jung et al., 1998). The future of this algorithm looks quite bright for biomedical applications, including the analysis of extremely large datsets from functional Magnetic Resonance Imaging (fMRI) experiments (McKewon et al, 1998).
ICA can be applied to many problems where mixtures are not orthogonal and the source signals are not Gaussian. Most information bearing signals have these characteristics. There are many interesting theoretical problems in ICA that have yet to be solved and there are many new applications, such as data mining, that have yet to be explored. The theoretical framework being developed here should provide a strong foundation for future research and applications.
Terrence J. Sejnowski, August 98