Native Language Identification from Raw Waveforms Using Deep Convolutional Neural Networks with Attentive Pooling

Ubale, Rutuja; Ramanarayanan, Vikram; Ye, Qian; Evanini, Keelan; Leong, Chee Wee; Lee, Chong Min

doi:10.1109/asru46091.2019.9003872

Cited by 5 publications

(6 citation statements)

References 35 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The final output of the RNN can be the output by the units at the last time step as well as the sequence of outputs for the entire time series. Attention is a mechanism proposed for the RNNs and is state-of-the-art for classification in most speech processing tasks ( Qian et al, 2019 ; Ubale et al, 2019 ). To employ attention in RNNs, outputs for all-time steps by a single RNN unit are collapsed by weighted averaging while the weights are learned automatically during training.…”

Section: Taxonomymentioning

confidence: 99%

“…Finally, global average pooling averages the output sequence from each filter to a single value. Recently, a variant of global average pooling, i.e., attentive pooling, has been proposed for speech accent classification tasks ( Ubale et al, 2019 ). Attentive pooling is a weighted global average with weights learned by training, with the weights for attentive pooling highlighting important input segments.…”

Section: Taxonomymentioning

confidence: 99%

“…Finally, Ubale et al (2019) propose to use attentive pooling in CNNs for accent classification. The model uses neural network for the classification with short term spectral features as input and PLDA with I-vector inputs.…”

Section: Comparative Analysismentioning

confidence: 99%

“…The fusion of both classifications achieves 83.32% accuracy over a recent TOEFL dataset managed by the Education Testing Service (ETS). More recently, Ubale et al (2019) applied CNN model directly to a raw audio and merged its output with I-vector classifier. The fusion reported an accuracy of 86.05%.…”

Section: Comparative Analysismentioning

confidence: 99%

See 3 more Smart Citations

A review of social background profiling of speakers from speech accents

Humayun,

Shuja,

Abas

2024

PeerJ Computer Science

View full text Add to dashboard Cite

Social background profiling of speakers is heavily used in areas, such as, speech forensics, and tuning speech recognition for accuracy improvement. This article provides a survey of recent research in speaker background profiling in terms of accent classification and analyses the datasets, speech features, and classification models used for the classification tasks. The aim is to provide a comprehensive overview of recent research related to speaker background profiling and to present a comparative analysis of the achieved performance measures. Comprehensive descriptions of the datasets, speech features, and classification models used in recent research for accent classification have been presented, with a comparative analysis made on the performance measures of the different methods. This analysis provides insights into the strengths and weaknesses of the different methods for accent classification. Subsequently, research gaps have been identified, which serve as a useful resource for researchers looking to advance the field.

show abstract

Section: Taxonomymentioning

confidence: 99%

Section: Taxonomymentioning

confidence: 99%

Section: Comparative Analysismentioning

confidence: 99%

Section: Comparative Analysismentioning

confidence: 99%

See 2 more Smart Citations

A review of social background profiling of speakers from speech accents

Humayun,

Shuja,

Abas

2024

PeerJ Computer Science

View full text Add to dashboard Cite

show abstract

“…They employed a long-short-term memory (LSTM) hybrid convolutional neural network (CNN) to automatically extract environmental and microphone features from speech. Other researchers have subsequently applied convolutional networks to audio processing [11,12,13] with promising results.…”

Section: Introductionsmentioning

confidence: 99%

Audio source identification based on residual network

zhang

Luo²

2022

Second International Symposium on Computer Technology and Information Science (ISCTIS 2022)

View full text Add to dashboard Cite

Large number of audio recordings are used in law enforcement and litigation procedures, and it also brings security issues such as the identification of the audio source. This paper mainly studies the problem of source identification (device detection). We proposed an audio source identification framework based on an improved residual network model that introduces a character category output, which will help to improve the identification accuracy for the special case of crossspeaker. Experiments show that this audio source identification framework based on residual network has achieved good results under the condition of non-target recognition task, with the highest accuracy rate reaching above 98%, which outperforms the current audio source identification algorithm.

show abstract