Text-independent talker identification with neural networks

Rudasi, L.; Zahorian, Stephen A.

doi:10.1109/icassp.1991.150358

Cited by 47 publications

(23 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…If the average output value is bigger than a threshold, the speaker is accepted (Oglesby and Mason, 1990). Rudasi and Zahorian (1991) demonstrated that by using small binary networks for distinguishing between two speakers instead of one large network with one output for each known speaker, the performance in speaker recognition was much better, since the binary networks were much more specialised. Another kind of networks, the time-delay neural networks (TDNN), were developed by Bennani and Gallinari (1991) to capture transient information using a connectionist approach.…”

Section: Gaussian Mixture Modelsmentioning

confidence: 99%

Fusing prosodic and acoustic information for speaker recognition

Farrús

2009

IJSLL

View full text Add to dashboard Cite

ADVERTIMENT. La consulta d'aquesta tesi queda condicionada a l'acceptació de les següents condicions d'ús: La difusió d'aquesta tesi per mitjà del servei TDX (www.tesisenxarxa.net) ha estat autoritzada pels titulars dels drets de propietat intel·lectual únicament per a usos privats emmarcats en activitats d'investigació i docència. No s'autoritza la seva reproducció amb finalitats de lucre ni la seva difusió i posada a disposició des d'un lloc aliè al servei TDX. No s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant al resum de presentació de la tesi com als seus continguts. En la utilització o cita de parts de la tesi és obligat indicar el nom de la persona autora.ADVERTENCIA. La consulta de esta tesis queda condicionada a la aceptación de las siguientes condiciones de uso: La difusión de esta tesis por medio del servicio TDR (www.tesisenred.net) ha sido autorizada por los titulares de los derechos de propiedad intelectual únicamente para usos privados enmarcados en actividades de investigación y docencia. No se autoriza su reproducción con finalidades de lucro ni su difusión y puesta a disposición desde un sitio ajeno al servicio TDR. No se autoriza la presentación de su contenido en una ventana o marco ajeno a TDR (framing). Esta reserva de derechos afecta tanto al resumen de presentación de la tesis como a sus contenidos. En la utilización o cita de partes de la tesis es obligado indicar el nombre de la persona autora. WARNING.On having consulted this thesis you're accepting the following use conditions: Spreading this thesis by the TDX (www.tesisenxarxa.net) service has been authorized by the titular of the intellectual property rights only for private uses placed in investigation and teaching activities. Reproduction with lucrative aims is not authorized neither its spreading and availability from a site foreign to the TDX service. Introducing its content in a window or frame foreign to the TDX service is not authorized (framing). This rights affect to the presentation summary of the thesis as well as to its contents. In the using or citation of parts of the thesis it's obliged to indicate the name of the author PhD Dissertation FUSING PROSODIC AND ACOUSTIC INFORMATION FOR SPEAKER RECOGNITION Mireia Farrús i Cabeceran AbstractAutomatic speaker recognition is the use of a machine to identify an individual from a spoken sentence. Recently, this technology has been undergone an increasing use in applications such as access control, transaction authentication, law enforcement, forensics, and system customisation, among others.One of the central questions addressed by this field is what is it in the speech signal that conveys speaker identity. Traditionally, automatic speaker recognition systems have relied mostly on short-term features related to the spectrum of the voice. However, human speaker recognition relies on other sources of information; therefore, there is reason to believe that these sources can play also an important role ...

show abstract

Section: Gaussian Mixture Modelsmentioning

confidence: 99%

Fusing prosodic and acoustic information for speaker recognition

Farrús

2009

IJSLL

View full text Add to dashboard Cite

show abstract

“…Others are not suited to handle a large number of classes. On the other hand, even when using an approach which can deal with large scale problems, an adequate decomposition of the classification problem into subproblems can be favorable to the overall computational complexity as well as to the generalization ability of the global classifier [17,3,20].…”

Section: ]mentioning

confidence: 99%

Improved pairwise coupling classification with correcting classifiers

Moreira

Mayoraz

1998

Machine Learning: ECML-98

View full text Add to dashboard Cite

A b s t r a c t . The benefits obtained from the decomposition of a classification task involving several classes, into a set of smaller classification problems involving two classes only, usually called dichotomies, have been exposed in various occasions. Among the multiple ways of applying the referred decomposition, Pairwise Coupling is one of the best known. Its principle is to separate a pair of classes in each binary subproblem, ignoring the remaining ones, resulting in a decomposition scheme containing as much subproblems as the number of possible pairs of classes in the original task. Pairwise Coupling decomposition has so far been used in different applications. In this paper, various ways of recombining the outputs of all the classifiers solving the existing subproblems are explored, and an important handicap of its intrinsic nature is exposed, which consists in the use, for the classification, of impertinent information. A solution for this problem is suggested and it is shown how it can significantly improve the classification accuracy. In addition, a powerful decomposition scheme derived from the proposed correcting procedure is presented. K e y w o r d s : Classification, decomposition into binary subproblems, pairwise coupling.

show abstract

“…The spectral/temporal features result in substantially higher classification rates for vowels than can be obtained by simply concatenating multiple frames of static features. This new feature set has been used to obtain vowel classification results of 70.9% for 16 vowels of the DARPA/TIMIT data base, higher than any other previously reported results ( [1], [4], [5]). …”

Section: Discussionmentioning

confidence: 56%

“…The pattern classification approach used in this study is called a binary paired partitioning (BPP) neural network [5,6]. This classification approach partitions an N-way classification task into N*(N-1)/2 two-way classification tasks.…”

Section: Classifiermentioning

confidence: 99%

Smoothed time/frequency features for vowel classification

Nossair

Zahorian

Proceedings of IEEE-SP International Symposium on Time- Frequency and Time-Scale Analysis

Self Cite

View full text Add to dashboard Cite

A novel signal modeling technique is described to compute smoothed time-frequency features for encoding speech information. These time-frequency features compactly and accurately model phonetic information, while accounting for the main effects of contextual variations. These segment-level features are computed such that more emphasis is given to the center of the segment and less to the end regions. For phonetic classification, the features are relatively insensitive to both time and frequency resolution, as least insofar as changes in window length and frame spacing are concerned. A 60-dimensional feature space based on this modeling technique resulted in 70.9 % accuracy for classification of 16 vowels extracted from the TIMIT data base in speaker-independent experiments. These results are higher than any other results reported in the literature for the same task.

show abstract

Text-independent talker identification with neural networks

Cited by 47 publications

References 3 publications

Fusing prosodic and acoustic information for speaker recognition

Fusing prosodic and acoustic information for speaker recognition

Improved pairwise coupling classification with correcting classifiers

Smoothed time/frequency features for vowel classification

Contact Info

Product

Resources

About