Yaroslav Getman scite author profile

Yaroslav Getman

5Publications

14Citation Statements Received

41Citation Statements Given

How they've been cited

How they cite others

110

Affiliations

Aalto University

Publications

Order By: Most citations

Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks

Moisio

Porjazovski

Rouhe

et al. 2022

Lang Resources & Evaluation

View full text Add to dashboard Cite

The Donate Speech campaign has so far succeeded in gathering approximately 3600 h of ordinary, colloquial Finnish speech into the Lahjoita puhetta (Donate Speech) corpus. The corpus includes over twenty thousand speakers from all the regions of Finland and from all age brackets. The primary goals of the collection were to create a representative, large-scale resource to study spontaneous spoken Finnish and to accelerate the development of language technology and speech-based services. In this paper, we present the collection process and the collected corpus, and showcase its versatility through multiple use cases. The evaluated use cases include: automatic speech recognition of spontaneous speech, detection of age, gender, dialect and topic and metadata analysis. We provide benchmarks for the use cases, as well downloadable, trained baseline systems with open-source code for reproducibility. One further use case is to verify the metadata and transcripts given in this corpus itself, and to suggest artificial metadata and transcripts for the part of the corpus where it is missing.

show abstract

Wav2vec2-based Paralinguistic Systems to Recognise Vocalised Emotions and Stuttering

Grósz

Porjazovski

Getman

et al. 2022

View full text Add to dashboard Cite

wav2vec2-based Speech Rating System for Children with Speech Sound Disorder

Getman¹,

Al-Ghezi²,

Voskoboinik³

et al. 2022

View full text Add to dashboard Cite

Speaking is a fundamental way of communication, developed at a young age. Unfortunately, some children with speech sound disorder struggle to acquire this skill, hindering their ability to communicate efficiently. Speech therapies, which could aid these children in speech acquisition, greatly rely on speech practice trials and accurate feedback about their pronunciations. To enable home therapy and lessen the burden on speech-language pathologists, we need a highly accurate and automatic way of assessing the quality of speech uttered by young children. Our work focuses on exploring the applicability of state-of-the-art self-supervised, deep acoustic models, mainly wav2vec2, for this task. The empirical results highlight that these self-supervised models are superior to traditional approaches and close the gap between machine and human performance.

show abstract

Developing an AI-Assisted Low-Resource Spoken Language Learning App for Children

et al. 2023

View full text Add to dashboard Cite

Automated Assessment of Task Completion in Spontaneous Speech for Finnish and Finland Swedish Language Learners

Voskoboinik¹,

Getman²,

Al-Ghezi³

et al. 2023

View full text Add to dashboard Cite

This study investigates the feasibility of automated content scoring for spontaneous spoken responses from Finnish and Finland Swedish learners. Our experiments reveal that pretrained Transformer-based models outperform the tf-idf baseline in automatic task completion grading. Furthermore, we demonstrate that pre-fine-tuning these models to differentiate between responses to distinct prompts enhances subsequent task completion finetuning. We observe that task completion classifiers exhibit accelerated learning and produce predictions with stronger correlations to human grading when accounting for task differences. Additionally, we find that employing similarity learning, as opposed to conventional classification fine-tuning, further improves the results. It is especially helpful to learn not just the similarities between the responses in one score bin, but the exact differences between the average human scores responses received. Lastly, we demonstrate that models applied to both manual and ASR transcripts yield comparable correlations to human grading.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Yaroslav Getman

Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks

Wav2vec2-based Paralinguistic Systems to Recognise Vocalised Emotions and Stuttering

wav2vec2-based Speech Rating System for Children with Speech Sound Disorder

Developing an AI-Assisted Low-Resource Spoken Language Learning App for Children

Automated Assessment of Task Completion in Spontaneous Speech for Finnish and Finland Swedish Language Learners

Contact Info

Product

Resources

About