2018
DOI: 10.1016/j.csl.2017.06.007
|View full text |Cite
|
Sign up to set email alerts
|

Restricted Boltzmann machines for vector representation of speech in speaker recognition

Abstract: Over the last few years, i-vectors have been the state-of-the-art technique in speaker recognition. Recent advances in Deep Learning (DL) technology have improved the quality of i-vectors but the DL techniques in use are computationally expensive and need phonetically labeled background data. The aim of this work is to develop an efficient alternative vector representation of speech by keeping the computational cost as low as possible and avoiding phonetic labels, which are not always accessible. The proposed … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
13
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
4
3
3

Relationship

5
5

Authors

Journals

citations
Cited by 37 publications
(18 citation statements)
references
References 10 publications
0
13
0
Order By: Relevance
“…A first attempt to use RBMs at the backend in a speaker verification task was made in Reference [20]. The authors put their efforts into the front end of a speaker verification system, in order to learn a compact and fixed dimensional speaker representation in the form of a speaker vector by means of RBM adaptation [21][22][23][24]. They also make use of DBNs in the i-vector/PLDA (Probabilistic Linear Discriminant Analysis) framework for speaker verification at the backend [25].…”
Section: Introductionmentioning
confidence: 99%
“…A first attempt to use RBMs at the backend in a speaker verification task was made in Reference [20]. The authors put their efforts into the front end of a speaker verification system, in order to learn a compact and fixed dimensional speaker representation in the form of a speaker vector by means of RBM adaptation [21][22][23][24]. They also make use of DBNs in the i-vector/PLDA (Probabilistic Linear Discriminant Analysis) framework for speaker verification at the backend [25].…”
Section: Introductionmentioning
confidence: 99%
“…In [22,23,24], several attempts have been made to improve the performance of speaker verification, using unsupervised learning such as Restricted Boltzmann Machines (RBMs) and Deep Belief Networks (DBNs). As an example of frontend, in [22,25], a vector representation of speakers was proposed by means of RBM adaptation.…”
Section: Introductionmentioning
confidence: 99%
“…The proposed embeddings are alternatives to traditional i-vectors [18] and the recent x-vectors [19] with much lower computational cost and suitable for online processing. The proposed embedding network is a DNN taking advantage of input speaker-corrupted supervectors and efficient variable ReLU (VReLU) [20] as an activation function trying to discriminate the background speakers. Speaker corruption is performed by adding supervectors, built by only 20 speech frames randomly selected from other speakers, to the supervectors of a given speaker.…”
Section: Introductionmentioning
confidence: 99%