CountNet: Estimating the Number of Concurrent Speakers Using Supervised Learning

Stöter, Fabian-Robert; Chakrabarty, Soumitro; Edler, Bernd; Habets, Emanuël A. P.

doi:10.1109/taslp.2018.2877892

Cited by 45 publications

(55 citation statements)

References 66 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…ML methods can be divided into two types (Zitnik et al, 2019), supervised learning and unsupervised learning. Supervised learning (Stoter et al, 2019) requires that the model be trained using a training set. The training sets for supervised learning include features and results.…”

Section: Machine Learning Methodsmentioning

confidence: 99%

Application of Machine Learning in Microbiology

et al. 2019

View full text Add to dashboard Cite

Microorganisms are ubiquitous and closely related to people’s daily lives. Since they were first discovered in the 19th century, researchers have shown great interest in microorganisms. People studied microorganisms through cultivation, but this method is expensive and time consuming. However, the cultivation method cannot keep a pace with the development of high-throughput sequencing technology. To deal with this problem, machine learning (ML) methods have been widely applied to the field of microbiology. Literature reviews have shown that ML can be used in many aspects of microbiology research, especially classification problems, and for exploring the interaction between microorganisms and the surrounding environment. In this study, we summarize the application of ML in microbiology.

show abstract

Section: Machine Learning Methodsmentioning

confidence: 99%

Application of Machine Learning in Microbiology

et al. 2019

View full text Add to dashboard Cite

show abstract

“…Speaker counting can be formulated as an (N + 1)-classes classification problem with N the maximum possible number of overlapping speakers [21]. While this approach is not the only one for supervised speaker counting, it has been found to be the most effective [22], provided the maximum possible number is known.…”

Section: Overlapped Speech Detection and Counting Taskmentioning

confidence: 99%

“…In parallel, Stöter et al [21] have shown that a neural network can be trained to estimate the number of concurrent speakers rather than simply performing joint VAD+OSD. This approach has been further expanded in [22] where three different output distributions for this speaker counting problem are proposed, different neural architectures are explored, and the performance is compared with humans. Also, in [23], a deep learning based speaker counting algorithm was evaluated against…”

Section: Introductionmentioning

confidence: 99%

“…Building on these previous works, we study supervised joint VAD+OSD and speaker counting in distant speech scenarios. We propose a Temporal Convolutional Network (TCN) architecture for these tasks, and evaluate it against previous works on joint VAD+OSD [17] and speaker counting [22] on AMI and CHiME-6 [14]. Because, to the best of our knowledge, supervised speaker counting has never been studied on real-world data, we also explore how the class imbalance problem can be mitigated using data augmentation.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Detecting and Counting Overlapping Speakers in Distant Speech Scenarios

Cornell¹,

Omologo²,

Squartini³

2020

Interspeech 2020

View full text Add to dashboard Cite

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

show abstract

“…In addition, various methods based on deep learning on counting the NoS have emerged, such as [18]- [21]. In [22], a new NoS estimation architecture is provided via combining the convolutional recurrent neural networks and adequate input features of speeches, which is designed to improve the performance of NoS estimation from the single channel mixtures.…”

Section: Introductionmentioning

confidence: 99%

Estimating Number of Speakers via Density-Based Clustering and Classification Decision

et al. 2019

View full text Add to dashboard Cite

It is crucial to robustly estimate the number of speakers (NoS) from the recorded audio mixtures in a reverberant environment. Some popular time-frequency (TF) methods approach this NoS estimation problem by assuming that only one of the speech components is active at each TF slot. However, this condition is violated in many scenarios where the speeches are convolved with long length of room impulse response coefficients, which causes degenerated performance of NoS estimation. To tackle this problem, a density-based clustering strategy is proposed to estimate NoS based on a local dominance assumption of speeches. Our method consists of several steps from clustering to classification of speakers with the consideration of robustness. First, the leading eigenvectors are extracted from the local covariance matrices of mixture TF components and ranked by the combination of local density and minimum distance to other leading eigenvectors with higher density. Second, a gap-based method is employed to determine the cluster centers from the ranked leading eigenvectors at each frequency bin. Third, a criterion based on averaged volume of cluster centers is proposed to select reliable clustering results at some frequency bins for the classification decision of NoS. The experiment results demonstrate that the proposed algorithm is superior to the existing methods in various reverberation cases with noise-free condition or noise condition. INDEX TERMSNumber of speakers, speeches, reverberation, audio source separation (ASS), local dominance, density-based clustering. YI GUO (M'16) received the B.E. degree (Hons.

show abstract

CountNet: Estimating the Number of Concurrent Speakers Using Supervised Learning

Cited by 45 publications

References 66 publications

Application of Machine Learning in Microbiology

Application of Machine Learning in Microbiology

Detecting and Counting Overlapping Speakers in Distant Speech Scenarios

Estimating Number of Speakers via Density-Based Clustering and Classification Decision

Contact Info

Product

Resources

About