2020 International Joint Conference on Neural Networks (IJCNN) 2020
DOI: 10.1109/ijcnn48605.2020.9206778
|View full text |Cite
|
Sign up to set email alerts
|

Learning Filterbanks from Raw Waveform for Accent Classification

Abstract: Most of the applications in speech use mel-frequency spectral coefficients (MFSC) as features as they match the human perceptual mechanism, where the emphasis is given to vocal tract characteristics. But in accent classification, mel-scale distribution of filters may not always be the best representations, e.g., pitch accented languages where the emphasis should be on vocal source information too. Motivated by this, we use end-toend classification of accents directly from waveforms which will reduce the effort… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
2
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1
1

Relationship

1
6

Authors

Journals

citations
Cited by 8 publications
(5 citation statements)
references
References 30 publications
0
2
0
Order By: Relevance
“…Instead of using fixed mel-scale spectral filters in feature representations for input to CNN, data-driven learnt spectral scale filters (as convolution layer) for dialect classification are investigated. Note that learnt spectral scale filters are well known and previously used for speech recognition (Seki et al, 2017), spoofing detection (Yu et al, 2017), and accent classification (Kethireddy et al, 2020b). As per our knowledge, this is the first attempt to propose to use learnt spectral scale filters for dialect classification.…”
Section: Spectral Filters As Convolution Layer In Cnnmentioning
confidence: 91%
See 1 more Smart Citation
“…Instead of using fixed mel-scale spectral filters in feature representations for input to CNN, data-driven learnt spectral scale filters (as convolution layer) for dialect classification are investigated. Note that learnt spectral scale filters are well known and previously used for speech recognition (Seki et al, 2017), spoofing detection (Yu et al, 2017), and accent classification (Kethireddy et al, 2020b). As per our knowledge, this is the first attempt to propose to use learnt spectral scale filters for dialect classification.…”
Section: Spectral Filters As Convolution Layer In Cnnmentioning
confidence: 91%
“…From the early studies on accent classification (Kat and Fung, 1999;Arslan and Hansen, 1997), it was found that the favourable spectral scale depends on the language of dialects and sub-dialects contained in it. Furthermore, from the accent classification studies with neural networks (Kethireddy et al, 2020b), it was found that the distribution of learnt frequency bands are different from standard mel-scale distribution. It was observed that learnt scale showed an improvement of 10.94% UAR (relative) over mel-scale.…”
Section: Introductionmentioning
confidence: 99%
“…The fusion reported an accuracy of 86.05%. Kethireddy, Kadiri & Gangashetty (2020) have also used CNN with the raw audio as input and reported 81.26% accuracy over a subset of CV dataset comprising eight accents.…”
Section: Comparative Analysismentioning
confidence: 99%
“…The work of Kethireddy et al [22], focused on learning filterbanks that are initialized using customized features obtained from raw waveform, which are incorporated into the CNN network for English accent recognition. The experimental results demonstrate strong performance with an accuracy of 81.26%; these results were attained by using their techniques on a common dataset of 8 English accents.…”
Section: Related Workmentioning
confidence: 99%