An Automated System for Regional Nativity Identification of Indian speakers from English Speech

Guntur, Radha Krishna; Krishnan, Rajesh; Mittal, Vinay Kumar

doi:10.1109/indicon47234.2019.9028980

Cited by 17 publications

(1 citation statement)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There are several types of embeddings, such as i-vectors and x-vectors. Both have been widely used in audio classification tasks (Wang et al, 2020;Weninger et al, 2019;Krishna et al, 2019;Adeeba and Hussain, 2019).…”

Section: Acoustic Modelingmentioning

confidence: 99%

Deep learning applied to speech processing: development of novel models and techniques

Carofilis Vasco

View full text Add to dashboard Cite

This thesis proposes and evaluates new machine learning techniques and models for different tasks in the field of speech processing. It mainly addresses the identification of speakers, languages, and accents using several descriptor proposals based on different sound representations. In addition, it presents a new transfer learning technique based on a new descriptor, and two new architectures for deep learning models based on complementary audio representations.The new transfer learning technique is based on a descriptor we call Grad-Transfer, which is based on the model interpretability method Gradient-weighted Class Activation Mapping (Grad-CAM). Grad-CAM generates a heat map of the most relevant zones in the input data according to their influence on a given model prediction. For the development of Grad-Transfer, we experimentally demonstrate, using Birch and k-means clustering algorithms, that the heat maps generated by the Grad-CAM method are able to store part of the knowledge acquired by a deep learning speech processing model fed by spectrograms during its training process. We exploited this capability of Grad-CAM to formulate a new technique that transfers knowledge from a pre-trained model to an untrained one, through the Grad-Transfer descriptor, which is responsible for summarizing and reusing such knowledge. Several Grad-Transfer-based models were evaluated for the accent identification task using the Voice Cloning Toolkit dataset. These models include Gaussian Naive Bayes, Support Vector Machines, and Passive Aggressive classifiers. Experimental results show an increase in performance of up to 23.58% in models fed by Grad-Transfer descriptors and spectrograms compared to models fed by spectrograms alone. This demonstrates the ability of Grad-Transfer to improve the performance of speech processing models and opens the door to new implementations for similar tasks.On the other hand, new transfer learning approaches based on embedding generation models were evaluated. Embeddings are generated by machine learning models trained for a specific task on large datasets. By exploiting the knowledge already acquired, these models can be reused for new tasks where the amount of available data is small. This thesis proposes a new architecture for deep learning models, called Mel and Wave Embeddings for Human Voice Tasks (MeWEHV), capable of generating robust embeddings for speech processing. MeWEHV combines embeddings generated by a pre-En conjunto, esta tesis presenta varios avances en las áreas de identificación de hablantes, idiomas y acentos, y propone nuevas técnicas y modelos que utilizan el aprendizaje por transferencia para mejorar el rendimiento de los modelos del estado del arte evaluados.

show abstract

Section: Acoustic Modelingmentioning

confidence: 99%

Deep learning applied to speech processing: development of novel models and techniques

Carofilis Vasco

View full text Add to dashboard Cite

show abstract

Identification of Indian English by Speakers of Multiple Native Languages

Guntur¹,

Ramakrishnan²,

Mittal

2021

Communications in Computer and Information Science

View full text Add to dashboard Cite

Enhancing Signal in Noisy Environment: A Review

Devi¹,

Mittal

2021

Smart Innovation, Systems and Technologies

View full text Add to dashboard Cite

Noise is present in all environments. In the signal processing domain also, there is a presence of noise. In any application, after processing, we need a signal which is free from noise. So to obtain a high-quality signal, elimination and reduction of noise are important. The different noises include Salt-Pepper, Gaussian, Speckle, Poisson, etc. So in order to eliminate this noise, some of the noise reduction techniques or filters can be employed in between. This survey paper presents the elimination of noise in images during duct, rain, snow, etc., elimination of noise in speech signals or mobile audio in high noisy environment such as conferences, meetings, heavy traffic conditions, and elimination of noise in biomedical signals in the presence of other noisy signals. For this purpose, many filtering techniques can be used, and depending on the filtered image obtained, we can find out which is best suited for the specific purpose.

show abstract

An Automated System for Regional Nativity Identification of Indian speakers from English Speech

Cited by 17 publications

References 22 publications

Deep learning applied to speech processing: development of novel models and techniques

Deep learning applied to speech processing: development of novel models and techniques

Identification of Indian English by Speakers of Multiple Native Languages

Enhancing Signal in Noisy Environment: A Review

Contact Info

Product

Resources

About