Audio-Visual Calibration with Polynomial Regression for 2-D Projection Using SVD-PHAT

Grondin, François; Tang, Hao; Glass, James

doi:10.1109/icassp40776.2020.9054690

Cited by 3 publications

(2 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Franc¸ois Grondin,[51] Proposed a simple 2-D approach by creating and overlaying the acoustic image with the visual field of a camera with the auditory field of an array microphone. Polynomial regression can effectively resolve non-linear video distortion using a low-cost microphone array and off-stage camera and that SVD-PHAT, a newly suggested approach for real-time analysis of sound sources can be tailored for this role.…”

Section: Iiia Review On (Linear Regression)mentioning

confidence: 99%

A Review on Linear Regression Comprehensive in Machine Learning

Maulud

Abdulazeez

2020

JASTT

597

145

View full text Add to dashboard Cite

Perhaps one of the most common and comprehensive statistical and machine learning algorithms are linear regression. Linear regression is used to find a linear relationship between one or more predictors. The linear regression has two types: simple regression and multiple regression (MLR). This paper discusses various works by different researchers on linear regression and polynomial regression and compares their performance using the best approach to optimize prediction and precision. Almost all of the articles analyzed in this review is focused on datasets; in order to determine a model's efficiency, it must be correlated with the actual values obtained for the explanatory variables.

show abstract

Section: Iiia Review On (Linear Regression)mentioning

confidence: 99%

A Review on Linear Regression Comprehensive in Machine Learning

Maulud

Abdulazeez

2020

JASTT

597

145

View full text Add to dashboard Cite

show abstract

“…In this scenario, there are two speech sources, the target and interference, and it is assumed that these sources have different DOAs. SteerNet assumes that the DOA of the target speech is available and is obtained using sound source localization methods [26,27,28], or using a visual cue when both optical and acoutic images are properly aligned [29].…”

Section: Steernetmentioning

confidence: 99%

GEV Beamforming Supported by DOA-based Masks Generated on Pairs of Microphones

Grondin¹,

Lauzon²,

Vincent³

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

Distant speech processing is a challenging task, especially when dealing with the cocktail party effect. Sound source separation is thus often required as a preprocessing step prior to speech recognition to improve the signal to distortion ratio (SDR). Recently, a combination of beamforming and speech separation networks have been proposed to improve the target source quality in the direction of arrival of interest. However, with this type of approach, the neural network needs to be trained in advance for a specific microphone array geometry, which limits versatility when adding/removing microphones, or changing the shape of the array. The solution presented in this paper is to train a neural network on pairs of microphones with different spacing and acoustic environmental conditions, and then use this network to estimate a time-frequency mask from all the pairs of microphones forming the array with an arbitrary shape. Using this mask, the target and noise covariance matrices can be estimated, and then used to perform generalized eigenvalue (GEV) beamforming. Results show that the proposed approach improves the SDR from 4.78 dB to 7.69 dB on average, for various microphone array geometries that correspond to commercially available hardware.

show abstract