2011
DOI: 10.1109/tasl.2011.2134088
|View full text |Cite
|
Sign up to set email alerts
|

Improved Modeling of Cross-Decoder Phone Co-Occurrences in SVM-Based Phonotactic Language Recognition

Abstract: Most common approaches to phonotactic language recognition deal with several independent phone decodings. These decodings are processed and scored in a fully uncoupled way, their time alignment (and the information that may be extracted from it) being completely lost. Recently, a new approach to phonotactic language recognition has been presented [1], which takes into account time alignment information, by considering crossdecoder phone co-occurrences at the frame level, under two language modeling paradigms: … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
7
0

Year Published

2011
2011
2015
2015

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 12 publications
(7 citation statements)
references
References 63 publications
0
7
0
Order By: Relevance
“…Since the subspace-based pattern (7) is not suitable for classifiers designed only for vectorial inputs, we need a dissimilarity-based learning algorithm that depends only on the distance metric (9) to discriminate utterances for the training and detection phases, which is briefly summarized as follows [26]: 1) In the training phase, we first construct the dissimilarity matrix , where denotes the total number of training utterances, and each entry corresponds to the dissimilarity between the pair of utterances (subspaces) i and j computed through the Projection metric (9). Thus, the i-th row of , , represents a new n-dimensional feature vector (called the dissimilarity vector) of utterance ( ).…”
Section: Dissimilarity-based Learning Schemementioning
confidence: 99%
See 1 more Smart Citation
“…Since the subspace-based pattern (7) is not suitable for classifiers designed only for vectorial inputs, we need a dissimilarity-based learning algorithm that depends only on the distance metric (9) to discriminate utterances for the training and detection phases, which is briefly summarized as follows [26]: 1) In the training phase, we first construct the dissimilarity matrix , where denotes the total number of training utterances, and each entry corresponds to the dissimilarity between the pair of utterances (subspaces) i and j computed through the Projection metric (9). Thus, the i-th row of , , represents a new n-dimensional feature vector (called the dissimilarity vector) of utterance ( ).…”
Section: Dissimilarity-based Learning Schemementioning
confidence: 99%
“…In their work, each of phone or sound sequences was represented by a high dimensional phonotactic feature vector with the n-gram counts or term frequency-inverse document frequency (TF-IDF) weights, whose dimensionality is equal to the total number of phonotactic patterns needed to characterize the structure of the utterance given by a decoder. Moreover, Penagarikano et al took time alignment information into account by considering time-synchronous cross-decoder phone cooccurrences [9]. They have thus defined a new concept of multiphone labels, which attempts to integrate the contributions given by several decoders frame by frame and form a VSM-based label sequence different from the conventional n-gram patterns.…”
Section: Introductionmentioning
confidence: 99%
“…In previous works, we have presented a new approach to phonotactic language recognition which uses statistics of Cross-Decoder Phone Co-occurrences (CDPC) at the frame level starting from 1-best phone strings in [9], and from lattices in [10]. CDPC take into account the simultaneous (time-synchronous) presence of two phone units (co-occurences) coming from two different phone decoders.…”
Section: Introductionmentioning
confidence: 99%
“…In experiments on the NIST LRE2007 database, fusing baseline phonotactic systems with systems based on crossdecoder phone co-occurrences led to improved performance in all the cases (see [7] for details). The approach described above was extended in [8], by considering counts of up to 3-grams (instead of just unigrams) of 2-phone and 3-phone co-occurrences in a SVM classifier. Additionally, a second approach was also introduced in [8], which did not consider n-grams of phone co-occurrences, but cooccurrences of phone n-grams (up to 3-grams).…”
Section: Introductionmentioning
confidence: 99%
“…The approach described above was extended in [8], by considering counts of up to 3-grams (instead of just unigrams) of 2-phone and 3-phone co-occurrences in a SVM classifier. Additionally, a second approach was also introduced in [8], which did not consider n-grams of phone co-occurrences, but cooccurrences of phone n-grams (up to 3-grams). In this paper, we present the latest developments attained under this second approach, which uses statistics of co-occurrences of phone ngrams (up to 4-grams) in a SVM-based phonotactic language recognizer.…”
Section: Introductionmentioning
confidence: 99%