Abstract-In this paper we propose a new binaural beamforming technique which can be seen as a relaxation of the linearly constrained minimum variance (LCMV) framework. The proposed method can achieve simultaneous noise reduction and exact binaural cue preservation of the target source, similar to the binaural minimum variance distortionless response (BMVDR) method. However, unlike BMVDR, the proposed method is also able to preserve the binaural cues of multiple interferers to a certain predefined accuracy. Specifically, it is able to control the trade-off between noise reduction and binaural cue preservation of the interferers by using a separate trade-off parameter perinterferer. Moreover, we provide a robust way of selecting these trade-off parameters in such a way that the preservation accuracy for the binaural cues of the interferers is always better than the corresponding ones of the BMVDR. The relaxation of the constraints in the proposed method achieves approximate binaural cue preservation of more interferers than other previously presented LCMV-based binaural beamforming methods that use strict equality constraints.
We propose a new robust distributed linearly constrained beamformer which utilizes a set of linear equality constraints to reduce the cross power spectral density matrix to a block-diagonal form. The proposed beamformer has a convenient objective function for use in arbitrary distributed network topologies while having identical performance to a centralized implementation. Moreover, the new optimization problem is robust to relative acoustic transfer function (RATF) estimation errors and to target activity detection (TAD) errors. Two variants of the proposed beamformer are presented and evaluated in the context of multi-microphone speech enhancement in a wireless acoustic sensor network, and are compared with other state-ofthe-art distributed beamformers in terms of communication costs and robustness to RATF estimation errors and TAD errors.
One of the biggest challenges in multi-microphone applications is the estimation of the parameters of the signal model such as the power spectral densities (PSDs) of the sources, the early (relative) acoustic transfer functions of the sources with respect to the microphones, the PSD of late reverberation, and the PSDs of microphone-self noise. Typically, the existing methods estimate subsets of the aforementioned parameters and assume some of the other parameters to be known a priori. This may result in inconsistencies and inaccurately estimated parameters and potential performance degradation in the applications using these estimated parameters. So far, there is no method to jointly estimate all the aforementioned parameters. In this paper, we propose a robust method for jointly estimating all the aforementioned parameters using confirmatory factor analysis. The estimation accuracy of the signal-model parameters thus obtained outperforms existing methods in most cases. We experimentally show significant performance gains in several multi-microphone applications over state-of-the-art methods.
We propose a new multi-microphone noise reduction technique for binaural cue preservation of the desired source and the interferers. This method is based on the linearly constrained minimum variance (LCMV) framework, where the constraints are used for the binaural cue preservation of the desired source and of multiple interferers. In this framework there is a trade-off between noise reduction and binaural cue preservation. The more constraints the LCMV uses for preserving binaural cues, the less degrees of freedom can be used for noise suppression. The recently presented binaural LCMV (BLCMV) method and the optimal BLCMV (OBLCMV) method require two constraints per interferer and introduce an additional interference rejection parameter. This unnecessarily reduces the degrees of freedom, available for noise reduction, and negatively influences the trade-off between noise reduction and binaural cue preservation. With the proposed method, binaural cue preservation is obtained using just a single constraint per interferer without the need of an interference rejection parameter. The proposed method can simultaneously achieve noise reduction and perfect binaural cue preservation of more than twice as many interferers as the BLCMV, while the OBLCMV can preserve the binaural cues of only one interferer.
Abstract-We propose a fast speech analysis method which simultaneously performs high-resolution voiced/unvoiced detection (VUD) and accurate estimation of glottal closure and glottal opening instants (GCIs and GOIs, respectively). The proposed algorithm exploits the structure of the glottal flow derivative in order to estimate GCIs and GOIs only in voiced speech using simple time-domain criteria. We compare our method with well-known GCI/GOI methods, namely, the dynamic programming projected phase-slope algorithm (DYPSA), the yet another GCI/GOI algorithm (YAGA) and the speech event detection using the residual excitation and a mean-based signal (SEDREAMS). Furthermore, we examine the performance of the aforementioned methods when combined with state-of-the-art VUD algorithms, namely, the robust algorithm for pitch tracking (RAPT) and the summation of residual harmonics (SRH). Experiments conducted on the APLAWD and SAM databases show that the proposed algorithm outperforms the state-of-the-art combinations of VUD and GCI/GOI algorithms with respect to almost all evaluation criteria for clean speech. Experiments on speech contaminated with several noise types (white Gaussian, babble, and car-interior) are also presented and discussed. The proposed algorithm outperforms the state-of-the-art combinations in most evaluation criteria for signal-to-noise ratio greater than 10 dB.Index Terms-Glottal closure instants (GCIs), glottal opening instants (GOIs), pitch estimation, speech analysis, voiced/unvoiced detection (VUD).
Abstract-Binaural beamformers (BFs) aim to reduce the output noise power while simultaneously preserving the binaural cues of all sources. Typically, the latter is accomplished via constraints relating the output and input interaural transfer functions (ITFs). The ITF is a function of the corresponding relative acoustic transfer function (RATF), which implies that RATF estimates of all sources in the acoustic scene are required. Here, we propose an alternative way to approximately preserve the binaural cues of the entire acoustic scene without estimating RATFs. We propose to preserve the binaural cues of all sources with a set of fixed pre-determined RATFs distributed around the head. Two recently proposed binaural BFs are evaluated in the context of using pre-determined RATFs and compared to the binaural minimum variance distrortionless response BF which can only preserve the binaural cues of the target.
Abstract-Binaural multi-microphone noise reduction methods aim at noise suppression while preserving the spatial impression of the acoustic scene. Recently, a new binaural speech enhancement method was proposed which chooses per timefrequency (TF) tile either the enhanced target or a suppressed noisy version. The selection between the two is based on the input SNR per TF tile. In this paper we modify this method such that the selection mechanism is based on the output SNR. The proposed modification of deciding which TF tile is target-or noise-dominated leads to choices, which are better aligned with simultaneous masking properties of the auditory system, and, hence, improves the performance over the initial version of the algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.