Improving Child Speech Disorder Assessment by Incorporating Out-of-Domain Adult Speech

Smith, Daniel; Sneddon, Alex; Ward, Lauren; Duenser, Andreas; Freyne, Jill; Silvera-Tawil, David; Morgan, Angela

doi:10.21437/interspeech.2017-455

Cited by 12 publications

(5 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Domain adaptation is a well studied problem in automatic speech recognition. Multiple techniques have been developed for the adaptation of acoustic models, such as transfer learning [1,2] and feature mapping [3,4]. Semi-supervised and lightly supervised adaptation techniques use a base model trained on out-of-domain supervised data to generate targets on in-domain unsupervised data.…”

Section: Introductionmentioning

confidence: 99%

Untranscribed Web Audio for Low Resource Speech Recognition

Carmantini¹,

Bell²,

Renals³

2019

Interspeech 2019

View full text Add to dashboard Cite

Speech recognition models are highly susceptible to mismatch in the acoustic and language domains between the training and the evaluation data. For low resource languages, it is difficult to obtain transcribed speech for target domains, while untranscribed data can be collected with minimal effort. Recently, a method applying lattice-free maximum mutual information (LF-MMI) to untranscribed data has been found to be effective for semi-supervised training. However, weaker initial models and domain mismatch can result in high deletion rates for the semi-supervised model. Therefore, we propose a method to force the base model to overgenerate possible transcriptions, relying on the ability of LF-MMI to deal with uncertainty. On data from the IARPA MATERIAL programme, our new semi-supervised method outperforms the standard semisupervised method, yielding significant gains when adapting for mismatched bandwidth and domain.

show abstract

Section: Introductionmentioning

confidence: 99%

Untranscribed Web Audio for Low Resource Speech Recognition

Carmantini¹,

Bell²,

Renals³

2019

Interspeech 2019

View full text Add to dashboard Cite

show abstract

“…We are in the process of collecting further data from 120 children with SSD in the Ultrax2020 project following the protocol described in [36]. In addition, we intend to add other available data to our repository, including adult data and alternative forms of articulatory imaging techniques (e.g., MRI of vocal tracts), all of which can be used in data augmentation methods [17,15,10]. We encourage other researchers to contribute by submitting their data for us to standardise and add to this repository.…”

Section: Discussionmentioning

confidence: 99%

“…Machine learning has the potential to automate much of this work, leading to better outcomes for patients without increasing workload for pathologists, but publicly available data that could facilitate this work is scarce. Existing work reports results on adult data [7,8,9], data that is not publicly available [10], or data that is in proprietary format [11,12,13]. Additionally, child speech processing and disordered speech processing are both known to present many challenges [14,15,16,17].…”

Section: Introductionmentioning

confidence: 99%

UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions

Eshky¹,

Ribeiro²,

Cleland

et al. 2018

Interspeech 2018

View full text Add to dashboard Cite

We introduce UltraSuite, a curated repository of ultrasound and acoustic data, collected from recordings of child speech therapy sessions. This release includes three data collections, one from typically developing children and two from children with speech sound disorders. In addition, it includes a set of annotations, some manual and some automatically produced, and software tools to process, transform and visualise the data.

show abstract

“…Much of the recent research on ASR for children has been focused on how data on adult speech can be used during training to improve recognition for children. Authors of [48, 49] investigated how to fine-tune models trained on adult speech recognition with child data. Fine-tuning is the process of first training a machine learning model on one domain with large amounts of data (in this case adult speech) and then retraining either parts of the model or the whole model on the target domain (here children’s speech).…”

Section: Technical Perspective On Development Of Lsa Softwarementioning

confidence: 99%

Multidisciplinary Perspectives on Automatic Analysis of Children’s Language Samples: Where Do We Go from Here?

Lüdtke

Bornman

Wet

et al. 2022

Folia Phoniatr Logop

View full text Add to dashboard Cite

Background: Language sample analysis (LSA) is invaluable to describe and understand child language use and development for clinical purposes and research. Digital tools supporting LSA are available, but many of the LSA steps have not been automated. Nevertheless, programs that include automatic speech recognition (ASR), the first step of LSA, have already reached mainstream applicability. Summary: To better understand the complexity, challenges and future needs of automatic LSA, including the tasks of transcribing, annotating and analysing natural child language samples, this article takes on a multi-disciplinary view. Requirements of a fully automated LSA process are characterized, features of existing LSA software tools compared, and prior work from the disciplines of information science and computational linguistics reviewed. Key Messages: Existing tools vary in their extent of automation provided across the process of LSA. Advances in machine learning for speech recognition and processing have potential to facilitate LSA, but the specifics of child speech and language as well as the lack of child data complicate software design. A transdisciplinary approach is recommended as feasible to support future software development for LSA.

show abstract

Improving Child Speech Disorder Assessment by Incorporating Out-of-Domain Adult Speech

Cited by 12 publications

References 19 publications

Untranscribed Web Audio for Low Resource Speech Recognition

Untranscribed Web Audio for Low Resource Speech Recognition

UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions

Multidisciplinary Perspectives on Automatic Analysis of Children’s Language Samples: Where Do We Go from Here?

Contact Info

Product

Resources

About