Spatial position constraint for unsupervised learning of speech representations

Humayun, Mohammad Ali; Yassin, Hayati; Abas, Pg Emeroylariffion

doi:10.7717/peerj-cs.650

Cited by 3 publications

(2 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Another study also analyzed the sliding window's effectiveness for extracting spectral or cepstral features [7], whilst Paliwal et al measured the impact of time window duration on speech recognition [8]. Research shows that unsupervised compression of cepstral speech features further enhances the classification accuracy for certain classification tasks [9]. Acoustic features of speech vary significantly for various speaker accents.…”

Section: Literature Reviewmentioning

confidence: 99%

Speaker Profiling Based on the Short-Term Acoustic Features of Vowels

Humayun,

Shuja,

Abas

2023

Technologies

Self Cite

View full text Add to dashboard Cite

Speech samples can provide valuable information regarding speaker characteristics, including their social backgrounds. Accent variations with speaker backgrounds reflect corresponding acoustic features of speech, and these acoustic variations can be analyzed to assist in tracking down criminals from speech samples available as forensic evidence. Speech accent identification has recently received significant consideration in the speech forensics research community. However, most works have utilized long-term temporal modelling of acoustic features for accent classification and disregarded the stationary acoustic characteristics of particular phoneme articulations. This paper analyzes short-term acoustic features extracted from a central time window of English vowel speech segments for accent discrimination. Various feature computation techniques have been compared for the accent classification task. It has been found that using spectral features as an input gives better performance than using cepstral features, with the lower filters contributing more significantly to the classification task. Moreover, detailed analysis has been presented for time window durations and frequency bin resolution to compute short-term spectral features concerning accent discrimination. Using longer time durations generally requires higher frequency resolution to optimize classification performance. These results are significant, as they show the benefits of using spectral features for speaker profiling despite the popularity of cepstral features for other speech-related tasks.

show abstract

Section: Literature Reviewmentioning

confidence: 99%

Speaker Profiling Based on the Short-Term Acoustic Features of Vowels

Humayun,

Shuja,

Abas

2023

Technologies

Self Cite

View full text Add to dashboard Cite

show abstract

“…The pre-training helps the model learn the text structure utilising massive unlabelled text data scraped from the web. The unsupervised pre-training has been proven very effective for a wide range of classification tasks [ 6 ].…”

Section: Introductionmentioning

confidence: 99%

A transformer fine-tuning strategy for text dialect identification

Humayun

Yassin

Shuja

et al. 2022

Neural Comput & Applic

Self Cite

View full text Add to dashboard Cite

Online medical consultation can significantly improve the efficiency of primary health care. Recently, many online medical question–answer services have been developed that connect the patients with relevant medical consultants based on their questions. Considering the linguistic variety in their question, social background identification of patients can improve the referral system by selecting a medical consultant with a similar social origin for efficient communication. This paper has proposed a novel fine-tuning strategy for the pre-trained transformers to identify the social origin of text authors. When fused with the existing adapter model, the proposed methods achieve an overall accuracy of 53.96% for the Arabic dialect identification task on the Nuanced Arabic Dialect Identification (NADI) dataset. The overall accuracy is 0.54% higher than the previous best for the same dataset, which establishes the utility of custom fine-tuning strategies for pre-trained transformer models.

show abstract