Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-1312
|View full text |Cite
|
Sign up to set email alerts
|

Multimodal Fusion of Multirate Acoustic, Prosodic, and Lexical Speaker Characteristics for Native Language Identification

Abstract: Native language identification from acoustic signals of L2 speakers can be useful in a range of applications such as informing automatic speech recognition (ASR), speaker recognition, and speech biometrics. In this paper we follow a multistream and multi-rate approach, for native language identification, in feature extraction, classification, and fusion. On the feature front we employ acoustic features such as MFCC and PLP features, at different time scales and different transformations; we evaluate speaker no… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

2
7
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 19 publications
(32 reference statements)
2
7
0
Order By: Relevance
“…The most confusable classification are between Hindi and Telugu, which are both languages used in India. Similar observations were found in the systems reported in [11,12], although with a lower count.…”
Section: Results and Analysissupporting
confidence: 89%
See 3 more Smart Citations
“…The most confusable classification are between Hindi and Telugu, which are both languages used in India. Similar observations were found in the systems reported in [11,12], although with a lower count.…”
Section: Results and Analysissupporting
confidence: 89%
“…It indicates that 600-dim i-Vector extracted from the posterior supervector of GMM with 1,024 Gaussian components can achieve the best performance. These results are similar to those that the Challenge systems [11,12] obtained, i.e., approximately 76% for both UAR and Acc by only using corpus, NNSE, provided by ComParE organizers. Thereafter, the dimension of i-vector is fixed to 600.…”
Section: Results and Analysissupporting
confidence: 86%
See 2 more Smart Citations
“…For example, Grèzes et al calculated the ratio of speaker overlap to aid conflict intensity estimation [13]; Montacié and Caraty detected temporal events (e.g. speech onset latency, event starting time-codes, pause and phone segments) to detect cognitive load [14], several authors extracted phone posterior-based attributes to determine the degree of nativeness or the native language of the speaker [15,16,17], while Huckvale and Beke developed specific spectral-based attributes to detect whether the speaker has a cold [18]. Of course, some kind of fusion of the general and the task-specific attributes might also prove to be beneficial.…”
Section: Introductionmentioning
confidence: 99%