Abstract:We study a novel neural speaker encoder and its training strategies for speaker recognition without using any identity labels. The speaker encoder is trained to extract a fixed dimensional speaker embedding from a spoken utterance of variable length. Contrastive learning is a typical self-supervised learning technique. However, the contrastive learning of the speaker encoder depends very much on the sampling strategy of positive and negative pairs. It is common that we sample a positive pair of segments from t… Show more
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.