Deep belief networks for i-vector based speaker recognition

Ghahabi, Omid; Hernando, Javier

doi:10.1109/icassp.2014.6853888

Cited by 81 publications

(59 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In other words, the URBM is adapted to the data of each speaker. The idea of this kind of adaptation has also shown success in [11,12,13,14] to initialize the parameters of DNNs for classification purposes. Figure 3 shows the weight matrices for URBM along with its adapted versions for two randomly selected speakers.…”

Section: Rbm Adaptationmentioning

confidence: 99%

See 1 more Smart Citation

From Features to Speaker Vectors by means of Restricted Boltzmann Machine Adaptation

Safari¹,

Ghahabi²,

Hernando³

2016

The Speaker and Language Recognition Workshop (Odyssey 2016)

Self Cite

View full text Add to dashboard Cite

Restricted Boltzmann Machines (RBMs) have shown success in different stages of speaker recognition systems. In this paper, we propose a novel framework to produce a vector-based representation for each speaker, which will be referred to as RBMvector. This new approach maps the speaker spectral features to a single fixed-dimensional vector carrying speaker-specific information. In this work, a global model, referred to as Universal RBM (URBM), is trained taking advantage of RBM unsupervised learning capabilities. Then, this URBM is adapted to the data of each speaker in the development, enrolment and evaluation datasets. The network connection weights of the adapted RBMs are further concatenated and subject to a whitening with dimension reduction stage to build the speaker vectors. The evaluation is performed on the core test condition of the NIST SRE 2006 database, and it is shown that RBM-vectors achieve 15% relative improvement in terms of EER compared to i-vectors using cosine scoring. The score fusion with i-vector attains more than 24% relative improvement. The interest of this result for score fusion yields on the fact that both vectors are produced in an unsupervised fashion and can be used instead of i-vector/PLDA approach, when no data label is available. Results obtained for RBM-vector/PLDA framework is comparable with the ones from i-vector/PLDA. Their score fusion achieves 14% relative improvement compared to i-vector/PLDA.

show abstract

Section: Rbm Adaptationmentioning

confidence: 99%

“…They have been utilized in an adaptation process [11,12,13,14], to further discriminatively model target and impostor speakers. RBMs have been recently used in DBNs as a pre-training stage to extract Baum-Welch statistics for i-vector and supervector extraction [15,16].…”

Section: Introductionmentioning

confidence: 99%

From Features to Speaker Vectors by means of Restricted Boltzmann Machine Adaptation

Safari¹,

Ghahabi²,

Hernando³

2016

The Speaker and Language Recognition Workshop (Odyssey 2016)

Self Cite

View full text Add to dashboard Cite

show abstract

“…As a typical deep learning model, DBN solves the training problem which may occur in a deep neural network. It is widely used in many different areas in the recent years, such as graphics processing and language recognition [12][13][14]. DBN is advanced model which can fit the complex nonlinear relationship between attributes in many issues [15,16].…”

Section: Related Workmentioning

confidence: 99%

“…It has been widely used in many different fields such as image [12], speech [13], and language processing [14]. It is a multilayer model which imitates the mode in which the human brain represents information.…”

Section: Introductionmentioning

confidence: 99%

Deep Belief Networks for Fingerprinting Indoor Localization Using Ultrawideband Technology

Luo

Gao

2016

International Journal of Distributed Sensor Networks

View full text Add to dashboard Cite

With the increasing requirement of localization services in indoor environment, indoor localization techniques have drawn a lot of attention. In recent years, fingerprinting localization techniques have been proved to be effective in indoor localization tasks. Due to the complexity and variability of indoor environment, some traditional geometric localization techniques based on time of arrival (TOA), received signal strength (RSS), or direction of arrival (DOA) may cause big position errors. Unlike common geometric localization methods, fingerprinting localization techniques estimate the position of target by creating a pattern matching model or regression model for the measurement. Therefore, a suitable learning model is the key of a fingerprinting location system. This paper presents a fingerprinting based localization technique using deep belief network (DBN) and ultrawideband (UWB) signals in an office environment. Some location-dependent parameters extracted from channel impulse response (CIR) are used as signatures to build the fingerprinting database. The construction of DBN which is based on the fingerprinting database is also discussed in this paper. Experiment results show that, with appropriate fingerprinting database and model structure, the location system can get desired accuracy.

show abstract

“…이 뿐만 아니라, 입력 시퀀스와 출력 라벨 사이의 비선형적 관계를 표현할 수 있는 DNN의 능력을 활용하기 위하여 DNN을 화자 인식에서 인식 기로 직접 사용하는 방법에 대한 연구도 진행되어왔 다. [4,5] 화자 인식에서 DNN을 이용한 분류 기법은 기 존에 사용되어온 support vector machine (SVM) 및 cosine distance 기반의 분류 방식에 비하여 높은 성능 을 보였다. 화자 인식에서와 마찬가지로 DNN은 연령 인식 [6,7] 과 언어 인식에서도 높은 성능을 보였다.…”

unclassified

Multiple Discriminative DNNs for I-Vector Based Open-Set Language Recognition

Kang¹,

Cho²,

Kang³

et al. 2016

The Journal of Korean Institute of Communications and Informati

View full text Add to dashboard Cite

In this paper, we propose an i-vector based language recognition system to identify the spoken language of the speaker, which uses multiple discriminative deep neural network (DNN) models analogous to the multi-class support vector machine (SVM) classification system. The proposed model was trained and tested using the . 이 뿐만 아니라, 입력 시퀀스와 출력 라벨 사이의 비선형적 관계를 표현할 수 있는 DNN의 능력을 활용하기 위하여 DNN을 화자 인식에서 인식

show abstract

Deep belief networks for i-vector based speaker recognition

Cited by 81 publications

References 14 publications

From Features to Speaker Vectors by means of Restricted Boltzmann Machine Adaptation

From Features to Speaker Vectors by means of Restricted Boltzmann Machine Adaptation

Deep Belief Networks for Fingerprinting Indoor Localization Using Ultrawideband Technology

Multiple Discriminative DNNs for I-Vector Based Open-Set Language Recognition

Contact Info

Product

Resources

About