Esta es la versión de autor de la comunicación de congreso publicada en: This is an author produced version of a paper published in: ) computation in limited suspect speech data conditions obtaining good calibration performance. Robustness is achieved by the use of speaker-independent information, adapting it to the specificities of the suspect involved in the process. Thus, this procedure allows the system to weight the relevance of the suspect specificities depending on the amount of suspect data available via MAP estimation. Experimental results show robustness to suspect data scarcity and stable performance for any amount of suspect material. Also, the proposed technique outperforms other previously proposed non-adaptive approaches. Results are presented as discrimination capabilities (DET plots), distributions of ). The use of such evaluation metrics allows us to highlight the importance of ¢ £ calibration in the performance of a forensic system.
Esta es la versión de autor de la comunicación de congreso publicada en: This is an author produced version of a paper published in: AbstractRecognition of speaker identity based on modeling the streams produced by phonetic decoders (phonetic speaker recognition) has gained popularity during the past few years. Two of the major problems that arise when phone based systems are being developed are the possible mismatches between the development and evaluation data and the lack of transcribed databases. Data-driven segmentation techniques provide a potential solution to these problems because they do not use transcribed data and can easily be applied on development data minimizing the mismatches. In this paper we compare speaker recognition results using phonetic and data-driven decoders. To this end, we have compared the results obtained with a speaker recognition system based on data-driven acoustic units and phonetic speaker recognition systems trained on Spanish and English data. Results obtained on the NIST 2005 Speaker Recognition Evaluation data show that the data-driven approach outperforms the phonetic one and that further improvements can be achieved by combining both approaches.
In the language recognition area Parallel Phone Recognition followed by Language Modelling (PPRLM) is one the most widespread approaches. Although all PPRLM systems are based on the same ideas, the performance achieved by such systems depends heavily on multiple design parameters that have to be defined. As part of our preparation for the 2005 NIST Language Recognition Evaluation we have explored the effect of some of these parameters. Some of them are very common in the design of PPRLM systems, such as the number of underlying phonetic recognisers, the normalisations used or the amount of training data available. Others, like the possibility of using unlabelled speech to train phonetic recognisers or changing the complexity of the phonetic recognisers are less common and provide ways to achieve slight improvements without more labelled speech.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.