2007
DOI: 10.1109/tasl.2007.902460
|View full text |Cite
|
Sign up to set email alerts
|

Acoustic Beamforming for Speaker Diarization of Meetings

Abstract: Abstract-When performing speaker diarization on recordings from meetings, multiple microphones of different qualities are usually available and distributed around the meeting room. Although several approaches have been proposed in recent years to take advantage of multiple microphones, they are either too computationally expensive and not easily scalable or they can not outperform the simpler case of using the best single microphone. In this work the use of classic acoustic beamforming techniques is proposed t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
253
0

Year Published

2009
2009
2019
2019

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 377 publications
(253 citation statements)
references
References 23 publications
0
253
0
Order By: Relevance
“…This constraint is valid for the CHiME-3 simulated data, which are simulated using a pure delay filter, but it does not hold anymore on real data. Indeed, early reflections (and to a lesser extent reverberation) modify the apparent speaker direction at each frequency, which results Table 4: WER (%) achieved by beamforming and spatial post-filtering applied on all channels except ch2 using the GMM backend retrained on enhanced real and simulated data (Prudnikov et al, 2015 (Anguera et al, 2007) which was used in many challenge submissions do not suffer from this issue due to the fact that their spatial response decays slowly in the neighborhood of the estimated speaker direction. Modern adaptive beamformers such as MCA or the mask-based MVDR beamformer of Yoshioka et al (2015) do not suffer from this issue either, due to the fact that they estimate the relative (inter-microphone) transfer function instead of the direction-of-arrival.…”
Section: 1 Beamforming and Post-filteringmentioning
confidence: 99%
See 1 more Smart Citation
“…This constraint is valid for the CHiME-3 simulated data, which are simulated using a pure delay filter, but it does not hold anymore on real data. Indeed, early reflections (and to a lesser extent reverberation) modify the apparent speaker direction at each frequency, which results Table 4: WER (%) achieved by beamforming and spatial post-filtering applied on all channels except ch2 using the GMM backend retrained on enhanced real and simulated data (Prudnikov et al, 2015 (Anguera et al, 2007) which was used in many challenge submissions do not suffer from this issue due to the fact that their spatial response decays slowly in the neighborhood of the estimated speaker direction. Modern adaptive beamformers such as MCA or the mask-based MVDR beamformer of Yoshioka et al (2015) do not suffer from this issue either, due to the fact that they estimate the relative (inter-microphone) transfer function instead of the direction-of-arrival.…”
Section: 1 Beamforming and Post-filteringmentioning
confidence: 99%
“…We investigate the resulting impact on the ASR performance when testing on multichannel data. To do so, we use the variant of DS beamforming implemented in BeamformIt (Anguera et al, 2007), which was used by many challenge entrants and was found to be among the best enhancement techniques for this corpus. We evaluate the resulting ASR performance using the updated DNN-based official baseline, similar to that in Section 4.1.…”
Section: Number Of Microphonesmentioning
confidence: 99%
“…1 Although X 1 (ω, f ) is being used as the reference channel, the selection is studied previously, for example in [9]. Therefore, the proposed method can be extended.…”
Section: A Estimation Of Number Of Sound Sources 1) Tdoa Estimation mentioning
confidence: 99%
“…However, most of the existing methods assume that the microphone location is given to estimate the direction of arrival of speakers [3]- [6]. Some methods using Time Difference Of Arrival (TDOA) have been proposed [7]- [9], which do not assume the known microphone location. These methods propose using HMM for speaker segmentation and clustering, as well as hierarchical agglomerative clustering using spacial information.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation