Colombian Dialect Recognition Based on Information Extracted from Speech and Text Signals

Escobar-Grisales, Daniel; Ríos-Urrego, Cristian David; Lopez-Santander, D. A.; Gallo-Aristizabal, J. D.; Vásquez-Correa, Juan Camilo; Nöth, Elmar; Orozco-Arroyave, Juan Rafael

doi:10.1109/asru51503.2021.9687890

Cited by 5 publications

(2 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The recent application of the transformer-based wav2vec 2.0 showcased its utility in developing speech-based age and gender prediction models, including cross-corpus evaluation, with significant improvements in recall compared to a classic modeling approach based on hand-crafted features [47]. Additionally, wav2vec 2.0 representations of speech were found to be more effective in distinguishing between PD and HC subjects compared to language representations, including word-embedding models [48]. As a pre-trained model, wav2vec shares the advantage with TRILLsson and x-vectors of being directly applicable without the need for further training, addressing the data-hungry nature common to many neural networks in the field.…”

Section: Related Workmentioning

confidence: 99%

Analyzing wav2vec embedding in Parkinson’s disease speech: A study on cross-database classification and regression tasks

Klempir,

Krupicka

2024

Preprint

View full text Add to dashboard Cite

Advancements in deep learning speech representations have facilitated the effective use of extensive datasets comprised of unlabeled speech signals, and have achieved success in modeling tasks associated with Parkinson's disease (PD) with minimal annotated data. This study focuses on PD non-fine-tuned wav2vec 1.0 architecture. Utilizing features derived from wav2vec embedding, we develop machine learning models tailored for clinically relevant PD speech diagnosis tasks, such as cross-database classification and regression to predict demographic and articulation characteristics, for instance, modeling the subjects' age and number of characters per second. The primary aim is to conduct feature importance analysis on both classification and regression tasks, investigating whether latent discrete speech representations in PD are shared across models, particularly for related tasks. The proposed wav2vec-based models were evaluated on PD versus healthy controls using three multi-language-task PD datasets. Results indicated that wav2vec accurately detected PD based on speech, outperforming feature extraction using mel-frequency cepstral coefficients in the proposed cross-database scenarios. Furthermore, wav2vec proved effective in regression, modeling various quantitative speech characteristics related to intelligibility and aging. Subsequent analysis of important features, obtained using scikit-learn feature importance built-in tools and the Shapley additive explanations method, examined the presence of significant overlaps between classification and regression models. The feature importance experiments discovered shared features across trained models, with increased sharing for related tasks, further suggesting that wav2vec contributes to improved generalizability. In conclusion, the study proposes wav2vec embedding as a promising step toward a speech-based universal model to assist in the evaluation of PD.

show abstract

Section: Related Workmentioning

confidence: 99%

Analyzing wav2vec embedding in Parkinson’s disease speech: A study on cross-database classification and regression tasks

Klempir,

Krupicka

2024

Preprint

View full text Add to dashboard Cite

show abstract

“…Deep learning models offer more promise than conventional ones. We'll test this further [20]. After then, the sounds are transformed into text using the speech-to-text module of IBM Watson.…”

Section: Literature Reviewmentioning

confidence: 99%

An Overview of Speech-To-Text Conversion

Aggarwal

2023

CIML

View full text Add to dashboard Cite

As a result of developments in science and technology, an automatic speech-to-text (STT) conversion system has been available. This system converts spoken words into text that can be read visually. People with trouble hearing may use this technology to communicate in other ways, including understanding voice communication and being able to follow directions using their visual abilities. There are instances when seeing something is more powerful than listening to something, particularly in long-distance communication; thus, speech-to-text conversion is crucial in situations like these. One of the fascinating developments to occur in the twenty-first century is the advent of machine learning. It has evolved from its roots in neurology studies conducted in the 1940s into something like artificial intelligence humans have created. Neural networks, a collection of complex structures, are the basis of machine learning. When combined with optimization techniques, these networks mimic the behaviour of neurons in the human brain and allow a computer to learn from its experiences. Here we explore one of many potential uses for such structures - the analysis of vocal performance in an original study. In particular, we dissect voice recognition systems to determine their inner workings.

show abstract

Colombian Dialect Recognition from Call-Center Conversations Using Fusion Strategies

Escobar-Grisales

Ríos-Urrego

Gallo-Aristizabal

et al. 2022

Communications in Computer and Information Science

View full text Add to dashboard Cite

Colombian Dialect Recognition Based on Information Extracted from Speech and Text Signals

Cited by 5 publications

References 15 publications

Analyzing wav2vec embedding in Parkinson’s disease speech: A study on cross-database classification and regression tasks

Analyzing wav2vec embedding in Parkinson’s disease speech: A study on cross-database classification and regression tasks

An Overview of Speech-To-Text Conversion

Colombian Dialect Recognition from Call-Center Conversations Using Fusion Strategies

Contact Info

Product

Resources

About