This paper extends upon a previous work using Mean Shift algorithm to perform speaker clustering on i-vectors generated from short speech segments. In this paper we examine the effectiveness of probabilistic linear discriminant analysis (PLDA) scoring as the metric of the mean shift clustering algorithm in the presence of different number of speakers. Our proposed method, combined with k-nearest neighbors (kNN) for bandwidth estimation, yields better and more robust results in comparison to the cosine similarity with fixed neighborhood bandwidth for clustering segments of large number of speakers. In the case of 30 speakers, we achieved evaluation parameter K of 72.1 with the PLDA-based mean shift algorithm compared to 65.9 with the cosine-based baseline system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.