Using cross-decoder phone coocurrences in phonotactic language recognition

Peñagarikano, Mikel; Varona, Amparo; Rodríguez-Fuentes, Luis Javier; Bordel, Germán

doi:10.1109/icassp.2010.5495056

Cited by 3 publications

(8 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [1], a complete representation of phone co-occurrences was used, so that SVM vectors comprised between 2000 and 3000 unigrams for 2-decoder configurations and more than 124000 unigrams for a 3-decoder configuration. Under such a complete representation, including bigrams and trigrams of phone co-occurrences in SVM vectors was prohibitive.…”

Section: Approach 1: N-grams Of Phone Co-occurrencesmentioning

confidence: 99%

“…Two variants of the approach presented in [1] are proposed. In the first one, SVM vectors consist of counts of up to 3-grams (instead of just unigrams) of 2-phone and 3-phone cooccurrences.…”

Section: Introductionmentioning

confidence: 99%

“…Recently, a simple approach has been proposed which takes into account cross-decoder phone co-occurrences at the frame level [1]. In that approach, phone segmentation is extracted as side information from 1-best phone decodings, and allows us to consider the co-occurrence of N phone labels (one per decoder) at each frame.…”

Section: Introductionmentioning

confidence: 99%

“…These decodings are processed and scored in a fully uncoupled way, their time alignment (and the information that may be extracted from it) being completely lost. Recently, a new approach to phonotactic language recognition has been presented [1], which takes into account time alignment information, by considering crossdecoder phone co-occurrences at the frame level, under two language modeling paradigms: smoothed n-grams and Support Vector Machines (SVM). Experiments on the NIST LRE2007 database demonstrated that using phone co-occurrence statistics could improve the performance of baseline phonotactic recognizers.…”

Section: Introductionmentioning

confidence: 99%

“…As for n-grams, the number of possible kphone co-occurrences increases exponentially with k, so in this work only 2-phone and 3-phone co-occurrences will be considered. In experiments on the NIST LRE2007 database, using Brno University of Technology (BUT) decoders for Czech, Hungarian and Russian [9], it was shown that fusing baseline phonotactic systems with systems based on cross-decoder phone co-occurrences led to improved performance in all the cases (see [1] for details). However, systems based on cross-decoder phone co-occurrences did not outperform the baseline phonotactic systems.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Improved Modeling of Cross-Decoder Phone Co-Occurrences in SVM-Based Phonotactic Language Recognition

Peñagarikano

Varona

Rodríguez-Fuentes

et al. 2011

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Most common approaches to phonotactic language recognition deal with several independent phone decodings. These decodings are processed and scored in a fully uncoupled way, their time alignment (and the information that may be extracted from it) being completely lost. Recently, a new approach to phonotactic language recognition has been presented [1], which takes into account time alignment information, by considering crossdecoder phone co-occurrences at the frame level, under two language modeling paradigms: smoothed n-grams and Support Vector Machines (SVM). Experiments on the NIST LRE2007 database demonstrated that using phone co-occurrence statistics could improve the performance of baseline phonotactic recognizers. In this paper, two variants of the cross-decoder phone co-occurrence SVM-based approach are proposed, by considering: (1) n-grams (up to 3-grams) of phone co-occurrences; and (2) co-occurrences of phone n-grams (up to 3-grams). To evaluate these approaches, a choice of open software (Brno University of Technology phone decoders, LIB-LINEAR and FoCal) was used, and experiments were carried out on the NIST LRE2007 database. Unlike those presented in [1], the two approaches presented in this paper outperformed the baseline phonotactic system, yielding around 16% relative improvement in terms of EER. The best fused system attained a 1,88% EER (a 30% improvement with regard to the baseline system), which supports the use of cross-decoder dependencies for language modeling.

show abstract

Section: Approach 1: N-grams Of Phone Co-occurrencesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations