“…In particular, classical VADs are piloted by the analysis of specific signal characteristics (Benyassine, Shlomot, Su, Massaloux, Lamblin & Petit, 1997;Yantorno, Krishnamachari, Lovekin, Benincasa & Wenndt, 2001) or rely on statistical models of the speech and noise signals (Sohn, Kim & Sung, 1999;Lee, Nakamura, Nisimura, Saruwatari & Shikano, 2004). Similarly, the more general sound localization task has been tackled by classical techniques such as Cross Spectrum Phase (CSP) (Knapp & Carter, 1976) and Steered-Response Power Phase Transform (SRP-PHAT) (Do, Silverman & Yu, 2007;Seewald, Gonzaga Jr, Veronez, Minotto & Jung, 2014;Belloch, Gonzalez, Vidal & Cobos, 2015). These techniques rely on two main stages: initially cross-correlation is employed for estimating the Time Difference of Arrival (TDOA) between each microphone pair under study, then TDOAs are combined and jointly processed for localizing the sound source.…”