Careful selection of step size parameters is often necessary to obtain good performance from gradient-based adaptive algorithms for decorrelation and source separation tasks. In this paper, we p r o vide an overview of methods for the on-line calculation of step size parameters for these systems. A particular emphasis is placed on gradient adaptive s t e p sizes for a class of natural gradient algorithms for decorrelation and blind source separation. Simulations verifying their useful behaviors are provided.
INTRODUCTIONBlind source separation (BSS) is the task of separating multiple statistically-independent signals from observed linear mixtures. Useful for many signal processing tasks, BSS has received much recent research a t t e n tion, and numerous useful algorithms have been developed 1]{ 4]. In BSS, one measures a vector sequence x(k) = x1(k) xn(k)] T that is assumed to t the modelwhere s(k) = s1(k) sm(k)] T , m n, contains m independent source signals and H is an (n m) mixing matrix.In adaptive solutions to the BSS task, an output signal vec-where W(k) i s a n ( m n) matrix of adaptive parameters. Then, the goal is to adjust W(k) such that the combined system matrix C(k) = W(k)H evolves aswhere P and D are permutation and nonsingular diagonal scaling matrices, respectively, s u c h that each si(k) i n s(k) appears in y(k).One particularly-useful iterative technique for BSS is the natural gradient algorithm given byfm(ym(k))] T is a vector of nonlinearly-modi ed output signals. This algorithm attempts to minimize the entropy-based cost function EfJ (W(k))g whereand fi(y) = ;@ log pi(y)=@y. Numerous useful properties concerning the behavior of (4) and the natural gradient h a ve been given in the literature 3]{ 6]. Perhaps the most important property possessed by (4) from the standpoint o f performance is its uniform convergence behavior for arbitrary matrices H, a property k n o wn as equivariance 4].A related task to BSS is adaptive decorrelation (AD) or prewhitening, in which x(k) has an autocorrelation Rxx = Efx(k)x T (k)g:The goal of AD is to determine W(k) in (2) such thatRyy(k) = I:As many adaptive systems behave better when driven by decorrelated signals, AD is a useful preprocessing step in adaptive ltering, array processing, blind deconvolution/equalization, and multilayer perceptron training.It can be shown that choosing f(y(k)) = y(k) in (4) yields an equivariant AD algorithm. Alternatively, the computationally-simple update W(k + 1 ) = W(k) + (k) I ; y(k)y T (k)] (8) with W(0) chosen as a symmetric matrix can be used. Performance analyses of both (4) and (8) for AD indicate that they work as desired, and the analyses provide useful information for choosing (k) to obtain stable, robust behavior from these schemes 7].A critical challenge in the above tasks is the choice of step size (k) to achieve fast initial adaptation, a low steadystate error, and in the case of nonstationary or time-varying signal models, good tracking performance. In this paper, we develop gradient-based methods for...