2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019
DOI: 10.1109/asru46091.2019.9003872
|View full text |Cite
|
Sign up to set email alerts
|

Native Language Identification from Raw Waveforms Using Deep Convolutional Neural Networks with Attentive Pooling

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
1
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(6 citation statements)
references
References 35 publications
0
1
0
Order By: Relevance
“…The final output of the RNN can be the output by the units at the last time step as well as the sequence of outputs for the entire time series. Attention is a mechanism proposed for the RNNs and is state-of-the-art for classification in most speech processing tasks ( Qian et al, 2019 ; Ubale et al, 2019 ). To employ attention in RNNs, outputs for all-time steps by a single RNN unit are collapsed by weighted averaging while the weights are learned automatically during training.…”
Section: Taxonomymentioning
confidence: 99%
See 3 more Smart Citations
“…The final output of the RNN can be the output by the units at the last time step as well as the sequence of outputs for the entire time series. Attention is a mechanism proposed for the RNNs and is state-of-the-art for classification in most speech processing tasks ( Qian et al, 2019 ; Ubale et al, 2019 ). To employ attention in RNNs, outputs for all-time steps by a single RNN unit are collapsed by weighted averaging while the weights are learned automatically during training.…”
Section: Taxonomymentioning
confidence: 99%
“…Finally, global average pooling averages the output sequence from each filter to a single value. Recently, a variant of global average pooling, i.e., attentive pooling, has been proposed for speech accent classification tasks ( Ubale et al, 2019 ). Attentive pooling is a weighted global average with weights learned by training, with the weights for attentive pooling highlighting important input segments.…”
Section: Taxonomymentioning
confidence: 99%
See 2 more Smart Citations
“…They employed a long-short-term memory (LSTM) hybrid convolutional neural network (CNN) to automatically extract environmental and microphone features from speech. Other researchers have subsequently applied convolutional networks to audio processing [11,12,13] with promising results.…”
Section: Introductionsmentioning
confidence: 99%