Spoken language identification using i-vectors, x-vectors, PLDA and logistic regression

Abdurrahman, Ahmad Iqbal; Zahra, Amalia

doi:10.11591/eei.v10i4.2893

Cited by 9 publications

(7 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…From the above result, the decision boundary has been calculated for threshold of returning a probability score between 0 and 1 [36]. Finally, the cost function represents the optimization of convex function to minimize the cost value and finding the global minimum of logistic regression.…”

Section: Logistic Regressionmentioning

confidence: 99%

Performance analysis of demand forecasting in energy consumption based on ensemble model

Dhanalakshmi

Ayyanathan

2022

Bulletin EEI

View full text Add to dashboard Cite

Over the previous decade, energy usage has increased exponentially all over the world. The machine learning algorithms are used to classify the demand and requirement of off and evening peak load of southern regional load dispatch centre (SRLDC) data. In this paper, data are classified based on demand and requirement of both evening and off peak of day wise southern regional grid of Andhra Pradesh, Karnataka, Kerala, Tamilnadu, and Pondicherry of different states are proposed. The machine learning algorithms like k-nearest neighbors (KNN), random forest, and logistic regression have been adopted to classify the model. To improve this model efficiency, an ensemble learning method is used to increase the accuracy. The performance measure of state-wise outcome is determined by classifying its demand and requirement needs over its state energy consumption and with different classification algorithms and it is improved by using a combined method of ensemble model with accuracy of 86%.

show abstract

Section: Logistic Regressionmentioning

confidence: 99%

Performance analysis of demand forecasting in energy consumption based on ensemble model

Dhanalakshmi

Ayyanathan

2022

Bulletin EEI

View full text Add to dashboard Cite

show abstract

“…In speaker verification, the universal background model (UBM) is a speaker model that represents broad attributes and characteristics that can be compared to the specific person being verified [1]. Later, i-vector and x-vector [2] based ASV systems were introduced to replace the gaussian mixture model-universal background model (GMM-UBM) based ASV systems. Deep learning approaches [3] such as recurrent neural network (RNN) [4] as a backend classifier was shown the capability in speaker verification with a low equal error rate (EER).…”

Section: Introductionmentioning

confidence: 99%

Artificial speech detection using image-based features and random forest classifier

Tan

Hijazi

Kok³

et al. 2022

IJ-AI

View full text Add to dashboard Cite

The ASVspoof 2015 Challenge was one of the efforts of the research community in the field of speech processing to foster the development of generalized countermeasures against spoofing attacks. However, most countermeasures submitted to the ASVspoof 2015 Challenge failed to detect the S10 attack effectively, the only attack that was generated using the waveform concatenation approach. Hence, more informative features are needed to detect previously unseen spoofing attacks. This paper presents an approach that uses data transformation techniques to engineer image-based features together with random forest classifier to detect artificial speech. The objectives are two-fold: (i) to extract image-based features from the melfrequency cepstral coefficients representation of the speech signal and (ii) to compare the performance of using the extracted features and Random Forest to determine the authenticity of voices with the existing approaches. An audio-to-image transformation technique was used to engineer new features in classifying genuine and spoof voices. An experiment was conducted to find the appropriate combination of the engineered features and classifier. Experimental results showed that the proposed approach was able to detect speech synthesis and voice conversion attacks effectively, with an equal error rate of 0.10% and accuracy of 99.93%.

show abstract

“…Many studies on language identification have been conducted, with various feature extraction and classification techniques being used. Several techniques are used to extract features from the audio data, including phone recognition followed by language modeling (PRLM) [5] and parallel phone recognition followed by language modeling (PPRLM) [5] for phonetic approach or perceptual linear prediction (PLP) [5], mel-frequency cepstral coefficient (MFCC) [6]- [8], i-vector [8], [9] and x-vector [10] for the acoustic approx neural networks [11], convolutional neural networks (CNN) [12], [13], logistic regression (LR) [8], PLDA [14], Gaussian mixture model (GMM) [15], [16], support vector machine [17], [18] are among techniques used to classify the language spoken.…”

Section: Introductionmentioning

confidence: 99%

“…These findings were obtained using a dataset of speech corpora in three Indonesian local languages (Javanese, Sundanese, and Minangkabau) that were independently recorded. Abdurrahman et al [8] used an acoustic approach with ivector and x-vector extraction features with probabilistic linear discriminant analysis (PLDA) and LR classifications to study three Indonesian local languages. As a result, the x-vector performs best when using PLDA, while the i-vector outperforms the x-vector when using LR.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Spoken language identification on 4 Indonesian local languages using deep learning

Wijonarko

Zahra

2022

Bulletin EEI

Self Cite

View full text Add to dashboard Cite

Language identification is at the forefront of assistance in many applications, including multilingual speech systems, spoken language translation, multilingual speech recognition, and human-machine interaction via voice. The identification of indonesian local languages using spoken language identification technology has enormous potential to advance tourism potential and digital content in Indonesia. The goal of this study is to identify four Indonesian local languages: Javanese, Sundanese, Minangkabau, and Buginese, utilizing deep learning classification techniques such as artificial neural network (ANN), convolutional neural network (CNN), and long-term short memory (LSTM). The selected extraction feature for audio data extraction employs mel-frequency cepstral coefficient (MFCC). The results showed that the LSTM model had the highest accuracy for each speech duration (3 s, 10 s, and 30 s), followed by the CNN and ANN models.

show abstract

Spoken language identification using i-vectors, x-vectors, PLDA and logistic regression

Cited by 9 publications

References 19 publications

Performance analysis of demand forecasting in energy consumption based on ensemble model

Performance analysis of demand forecasting in energy consumption based on ensemble model

Artificial speech detection using image-based features and random forest classifier

Spoken language identification on 4 Indonesian local languages using deep learning

Contact Info

Product

Resources

About