Speech enhancement and separation are core problems in audio signal processing, with commercial applications in devices as diverse as mobile phones, conference call systems, hands-free systems, or hearing aids. In addition, they are crucial pre-processing steps for noise-robust automatic speech and speaker recognition. Many devices now have two to eight microphones. The enhancement and separation capabilities offered by these multichannel interfaces are usually greater than those of single-channel interfaces. Research in speech enhancement and separation has followed two convergent paths, starting with microphone array processing and blind source separation, respectively. These communities are now strongly interrelated and routinely borrow ideas from each other. Yet, a comprehensive overview of the common foundations and the differences between these approaches is lacking at present. In this article, we propose to fill this gap by analyzing a large number of established and recent techniques according to four transverse axes: a) the acoustic impulse response model, b) the spatial filter design criterion, c) the parameter estimation algorithm, and d) optional postfiltering. We conclude this overview paper by providing a list of software and data resources and by discussing perspectives and future trends in the field.
In multiple speaker scenarios, the so-called linearly constrained minimum variance (LCMV) beamformer is a popular microphone array-based speech enhancement technique, as it allows minimizing the noise power while maintaining a set of desired responses towards the different speakers. In this paper, we address the algorithmic challenges arising in the application of the LCMV beamformer in so-called wireless acoustic sensor networks (WASNs), which are a next-generation technology for audio acquisition and processing. We review three optimal distributed LCMV-based algorithms, which compute a networkwide LCMV beamformer output at each node without centralizing the microphone signals. Optimality here refers to the fact that the algorithms theoretically generate the same beamformer outputs as in a centralized realization where a single processor would have access to all the signals. We derive and motivate the algorithms in an accessible top-down framework that reveals the underlying relations between them, as well as their differences. We explain how these differences result from their different design criterion (node-specific versus common constraints sets), as well as their different priorities with respect to communication bandwidth, computational power, and adaptivity. Furthermore, although the three algorithms were originally proposed for a fully-connected WASN, we also explain how they can be extended to the case of a partially-connected WASN, which is assumed to be pruned to a tree topology. Finally, we discuss the advantages and disadvantages of the various algorithms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.