2020
DOI: 10.1007/s40747-020-00172-1
|View full text |Cite
|
Sign up to set email alerts
|

Speaker recognition based on characteristic spectrograms and an improved self-organizing feature map neural network

Abstract: To obtain a speaker’s pronunciation characteristics, a method is proposed based on an idea from bionics, which uses spectrogram statistics to achieve a characteristic spectrogram to give a stable representation of the speaker’s pronunciation from a linear superposition of short-time spectrograms. To deal with the issue of slow network training and recognition speed for speaker recognition systems on resource-constrained devices, based on a traditional SOM neural network, an adaptive clustering self-organizing … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 19 publications
(8 citation statements)
references
References 22 publications
0
8
0
Order By: Relevance
“…(5) e British English pronounces |i | sound, and the American English pronounces | z |; e.g., the word system is pronounced as |'sistim| in British style and |'sistzm| in American style. (6) ere are some words that are pronounced completely differently in British English and American English. For example, leisure in British is |'leiz| and in American is |'li:zzr|.…”
Section: Results Analysis and Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…(5) e British English pronounces |i | sound, and the American English pronounces | z |; e.g., the word system is pronounced as |'sistim| in British style and |'sistzm| in American style. (6) ere are some words that are pronounced completely differently in British English and American English. For example, leisure in British is |'leiz| and in American is |'li:zzr|.…”
Section: Results Analysis and Discussionmentioning
confidence: 99%
“…Based on the improved deep learning convolutional network, Lin et al [5] extracted multi-scale normalized local features with stacked decreasing convolution kernel, improved the convergence speed and stability of the algorithm with dynamic learning rate, and achieved better recognition rate. Jia et al [6] used three-hidden layer deep neural network to identify MCFF features of acoustic signals and achieved better recognition results than SVM and GMM. Compared with traditional classifiers, deep learning network improves the detection accuracy of pronunciation, but its huge parameter requirements, complex parameter settings, and calculation requirements require further optimization and improvement in practical applications [7].…”
Section: Introductionmentioning
confidence: 99%
“…Sa, Ri, Ga, Ma, Pa, Da, Ni are the seven musical notes carrying frequencies which further subdivided into semitones or microtones [33]. Full forms of Sa, Ri, Ga, Ma, Pa, Da, Ni are Shadja, Rishaba, Gandhara, Madhyama, Panchama, Dhaivatha and Nishadha [5].…”
Section: Introductionmentioning
confidence: 99%
“…Deep neural networks have achieved excellent results that traditional machine learning can difficultly match in various fields, such as computer vision [20], point cloud processing [15,34], medical data processing [19], speech recognition [17], and so on. With the continuous development of deep learning, the artificial neural networks are becoming more and more deep, wide and complicated.…”
Section: Introductionmentioning
confidence: 99%