Interspeech 2016 2016
DOI: 10.21437/interspeech.2016-1100
|View full text |Cite
|
Sign up to set email alerts
|

Native Language Identification Using Spectral and Source-Based Features

Abstract: The task of native language (L1) identification from nonnative language (L2) can be thought of as the task of identifying the common traits that each group of L1 speakers maintains while speaking L2 irrespective of the dialect or region. Under the assumption that speakers are L1 proficient, non-native cues in terms of segmental and prosodic aspects are investigated in our work. In this paper, we propose the use of longer duration cepstral features, namely, Mel frequency cepstral coefficients (MFCC) and auditor… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(5 citation statements)
references
References 18 publications
0
5
0
Order By: Relevance
“…To capture the temporal structure in the short-term features, short term features are usually appended with Delta features, where delta features for each time frame represent the local slope of coefficients over a small number of neighbouring time frames found by least-squares ( Rajpal et al, 2016 ).…”
Section: Taxonomymentioning
confidence: 99%
See 1 more Smart Citation
“…To capture the temporal structure in the short-term features, short term features are usually appended with Delta features, where delta features for each time frame represent the local slope of coefficients over a small number of neighbouring time frames found by least-squares ( Rajpal et al, 2016 ).…”
Section: Taxonomymentioning
confidence: 99%
“…Short term Filter-Bank energies have been proven effective for several speech accent classification tasks ( Rajpal et al, 2016 ; Sailor & Patil, 2016 ; Shon, Ali & Glass, 2018 ). Besides, MFCC coefficients have little correlation, which makes them suitable as input for classification models.…”
Section: Literature Reviewmentioning
confidence: 99%
“…The participants were provided with the audio files (amplitude normalized) and were also pointed to the toolkit that was used to extract the audio features for the baseline system provided by the sub-challenge organizers. It is obvious that the extracted features did not reflect only the actual content of the utterances but also -and possibly more prominently -the System UAR (%) 1 (Abad et al, 2016) 84.6 2 (Shivakumar et al, 2016) 78.6 3 (Gosztolya et al, 2016) 70.7 4 (Huckvale, 2016) 69.8 5 (Senoussaoui et al, 2016) 68.4 6 (Keren et al, 2016) 61.5 7 (Jiao et al, 2016) 52.2 8 (Rajpal et al, 2016) 39.8 baseline 45.1 acoustic properties of the speech that are supposedly and significantly influenced by the speaker's native language. Given the usual background of the INTERSPEECH attendees, it is only natural that most participants of the sub-challenge had a strong background in speech signal processing and (at least the top teams) concentrated on their own sophisticated methods for feature extraction.…”
Section: Speech-based Nlimentioning
confidence: 99%
“…Studies by Hansen and Liu (2016) have shown that acoustic variations are more prominent than the linguistic variations [acoustic models performed better than linguistic models by 15.8% absolute unweighted average recall (UAR)] for major dialects of English. The acoustic variations among dialects include segmental and supra-segmental features, and they can be extracted directly from the speech signal (Behravan et al, 2016;Bougrine et al, 2018;DeMarco and Cox, 2012;Rajpal et al, 2016;Rouas, 2007) or they can be modelled indirectly from the phonetic information derived from the speech signal (Chen et al, 2011;Chen et al, 2014;Najafian et al, 2018;Shon et al, 2018a).…”
Section: Introductionmentioning
confidence: 99%