2013
DOI: 10.1111/exsy.12030
|View full text |Cite
|
Sign up to set email alerts
|

Speaker verification using heterogeneous neural network architecture with linear correlation speech activity detection

Abstract: This paper presents a multi-level speaker verification system that uses 64 discrete Fourier transform spectrum components as input feature vectors. A speech activity detection technique is used as a pre-processing stage to identify vowel phoneme boundaries within a speech sample. A modified self-organising map (SOM) is then used to filter the speech data by using cluster information extracted from three vowels for a claimed speaker. This SOM filtering stage also provides coarse speaker verification. Finally, a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2014
2014
2021
2021

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 5 publications
(1 citation statement)
references
References 33 publications
0
1
0
Order By: Relevance
“…The three vowels are contained in the words (two, five and eight) of the CSLU2002 database. The position of the target vector for each neuron is chosen after locating the vowel region within each word in the enrolment speech sample using the pre-processing linear correlation technique presented in the work of Tashan et al (2012). When the spike timings of an input vector are fully synchronised with the spike timing of the target vector, each input synapse connected to the spiking neuron will respond with a maximum value of 1, resulting in an output of 1 as shown in Figure 8b.…”
Section: Proposed Algorithmmentioning
confidence: 99%
“…The three vowels are contained in the words (two, five and eight) of the CSLU2002 database. The position of the target vector for each neuron is chosen after locating the vowel region within each word in the enrolment speech sample using the pre-processing linear correlation technique presented in the work of Tashan et al (2012). When the spike timings of an input vector are fully synchronised with the spike timing of the target vector, each input synapse connected to the spiking neuron will respond with a maximum value of 1, resulting in an output of 1 as shown in Figure 8b.…”
Section: Proposed Algorithmmentioning
confidence: 99%