This paper is concerned with the verification effectiveness in open-set, text-independent speaker identification. The study includes an analysis of the characteristics of this mode of speaker recognition and the potential causes of errors. The use of well-known score normalisation techniques for the purpose enhancing the reliability of the process is described and their relative effectiveness is experimentally investigated. The experiments are based on the dataset proposed for the 1-speaker detection task of the NIST Speaker Recognition Evaluation 2003. Based on the experimental results, it is demonstrated that significant benefits is achieved by using score normalisation in open-set identification, and that the level of this depends highly on the type of the approach adopted. The results also show that better performance can be achieved by using the cohort normalisation methods. In particular, the unconstrained cohort method with a relatively small cohort size appears to outperform all other approaches.
This paper presents investigations into the ability of speaker verification technology to discriminate between identical twins. It is shown that whilst, in general, the genetic and non-genetic characteristics of voice are both of value to speaker verification capabilities, it is the latter which is highly beneficial in the separation of the speech of identical twins. It is further demonstrated that through the use of unconstrained cohort normalisation as a complementary means for the exploitation of such voice characteristics, the verification reliability can be considerably enhanced for both identical twins and unrelated speakers. Experiments were conducted using a bespoke clean-speech database consisting of utterances from forty nine identical twin pairs. The paper details the problem in speaker verification posed by identical twins, discusses the experimental investigations and provides an analysis of the results.
A new approach to speaker change detection is proposed and investigated. The method, which is based on a probabilistic framework, provides an effective means for tackling the problem posed by phonetic variation in high-resolution speaker change detection. Additionally, the approach incorporates the capability for dealing with undesired effects of variations in speech characteristics. Using the experimental investigations conduced with clean and broadcast news audio, it is shown that the proposed method is significantly more effective than the currently popular techniques for speaker change detection. To enhance the computational efficiency of the proposed method, modified implementation algorithms are introduced which are based on the exploitation of the redundant operations and a fast scoring procedure. It is shown that, through the use of the proposed fast algorithm, the computational efficiency of the approach can be increased by over 77% without significant reduction in its accuracy. The paper discusses the principles and characteristics of the proposed speaker change detection method, and provides a detailed description of its efficient implementation. The experiments, investigating the performance of the proposed method and its effectiveness in relation to other approaches, are described and an analysis of the results is presented.
Abstract-This letter presents an investigation into the use of a probabilistic pattern matching approach for detecting speaker changes in audio streams. The experiments are conducted using clean speech as well as broadcast news material. It is shown that, in the proposed approach, the use of bilateral scoring is considerably more effective than unilateral scoring. Appropriate score normalization methods are considered in the study. It is observed that in all the cases, the bilateral scoring approach outperforms the currently popular method of Bayesian information criterion (BIC) for speaker change detection. This letter discusses the principles of the proposed approach and details the experimental investigations.
Abstract. The concern in this study is the approach to evaluating the performance of the open-set speaker identification process. In essence, such a process involves first identifying the speaker model in the database that best matches the given test utterance, and then determining if the test utterance has actually been produced by the speaker associated with the best-matched model. Whilst, conventionally, the performance of each of these two sub-processes is evaluated independently, it is argued that the use of a measure of performance for the complete process can provide a more useful basis for comparing the effectiveness of different systems. Based on this argument, an approach to assessing the performance of open-set speaker identification is considered in this paper, which is in principle similar to the method used for computing the diarisation error rate. The paper details the above approach for assessing the performance of open-set speaker identification and presents an analysis of its characteristics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.