Two of the major challenges in microphone array based adaptive beamforming, speech enhancement and distant speech recognition, are robust and accurate source localization and voice activity detection. This paper introduces a spatial gradient steered response power using the phase transform (SRP-PHAT) method which is capable of localization of competing speakers in overlapping conditions. We further investigate the behavior of the SRP function and characterize theoretically a fixed point in its search space for the diffuse noise field. We call this fixed point the null position in the SRP search space. Building on this evidence, we propose a technique for multichannel voice activity detection (MVAD) based on detection of a maximum power corresponding to the null position. The gradient SRP-PHAT in tandem with the MVAD form an integrated framework of multi-source localization and voice activity detection. The experiments carried out on real data recordings show that this framework is very effective in practical applications of hands-free communication.
This paper studies the problem of multiple speaker localization via speech separation based on model-based sparse recovery. We compare and contrast computational sparse optimization methods incorporating harmonicity and block structures as well as autoregressive dependencies underlying spectrographic representation of speech signals. The results demonstrate the effectiveness of block sparse Bayesian learning framework incorporating autoregressive correlations to achieve a highly accurate localization performance. Furthermore, significant improvement is achieved using ad-hoc microphones for data acquisition set-up compared to the compact microphone array.
In this paper, the position of a pulse width modulation (PWM)-driven pneumatic actuator has been controlled using a dynamic neural network (DNN) and Proportional Integral Derivative (PID) controller. The harmony search algorithm (HSA) has been used to unravel the optimization problem. The DNN controller is optimally designed to control the position of the actuator. As to the performance of the PID controller, it can assist the DNN controller to give better results. Therefore, an optimal hybrid scheme with both DNN and PID controllers based on HSA is suggested. A pneumatic circuit containing a fast-switching valve is used to reduce the complexity of the PWM-driven servo pneumatic system along with its cost price.
Abstract-Spatial filtering is the fundamental characteristic of microphone array based signal acquisition, which plays an important role in applications such as speech enhancement and distant speech recognition. In the array processing literature, this property is formulated upon beam-pattern steering and it is characterized for narrowband signals.This paper proposes to characterize the microphone array broadband beam-pattern based on the average output of a steered beamformer for a broadband spectrum. Relying on this characterization, we derive the directivity beam-pattern of delayand-sum and superdirective beamformers for a linear as well as a circular microphone array. We further investigate how the broadband beam-pattern is linked to speech recognition feature extraction; hence, it can be used to evaluate distant speech recognition performance. The proposed theory is demonstrated with experiments on real data recordings.Index Terms-Broadband beam-pattern, Delay-and-sum beamformer, Superdirective beamformer, Distant speech recognition.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.