2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2018
DOI: 10.1109/icassp.2018.8462026
|View full text |Cite
|
Sign up to set email alerts
|

Insights in-to-End Learning Scheme for Language Identification

Abstract: A novel interpretable end-to-end learning scheme for language identification is proposed. It is in line with the classical GMM ivector methods both theoretically and practically. In the end-to-end pipeline, a general encoding layer is employed on top of the frontend CNN, so that it can encode the variable-length input sequence into an utterance level vector automatically. After comparing with the state-of-the-art GMM i-vector methods, we give insights into CNN, and reveal its role and effect in the whole pipel… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
21
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
3
2

Relationship

3
6

Authors

Journals

citations
Cited by 22 publications
(21 citation statements)
references
References 28 publications
0
21
0
Order By: Relevance
“…Therefore, a flexible processing method should have the ability to accept speech segments with arbitrary duration. Motivated by [21,22,24], the whole end-to-end framework in this paper is shown in Fig. 3.…”
Section: End-to-end System Overviewmentioning
confidence: 99%
“…Therefore, a flexible processing method should have the ability to accept speech segments with arbitrary duration. Motivated by [21,22,24], the whole end-to-end framework in this paper is shown in Fig. 3.…”
Section: End-to-end System Overviewmentioning
confidence: 99%
“…Short-time Cepstral Mean Subtraction (CMS) is applied with 3 s sliding window. For the end-to-end network, they use a residual network (ResNet) system with a global statistics pooling layer and a fully connected layer and each output layer is represented as target dialect class [25]. The model was trained with standard cross-entropy loss with a softmax layer.…”
Section: Adimentioning
confidence: 99%
“…2. It is surprised that although LSTM or GRU layers introduce much more parameter than TAP layer, it results in degraded performance, especially for the testing task over a long-range duration [10].…”
Section: Introductionmentioning
confidence: 99%