Automated Screening of Speech Development Issues in Children by Identifying Phonological Error Patterns

Ward, Lauren; Stefani, Alessandro; Smith, Daniel; Duenser, Andreas; Freyne, Jill; Dodd, Barbara; Morgan, Angela

doi:10.21437/interspeech.2016-850

Cited by 15 publications

(17 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other approaches have exploited the fact that many assessments of speech and articulation are made with known texts. The improvement in recognition this can yield for clinical ASR has been demonstrated previously by this group [17], where an improvement of 33.1% in phoneme recognition was achieved when the target text was exploited by the constrained decoder. Other groups have had similar success with this approach, improving detection of pathological voice by 20% [18].…”

Section: Introductionsupporting

confidence: 55%

“…The input to these models is speech made up of target words in isolation, elicited through a picture naming task. The acoustic model used in the previous system [17] used a hierarchical neural network (HNN) with long temporal context features over a 310ms window. This HNN cascaded a pair of neural networks (NN) into a third NN, with the third NN trained on the concatenated posterior probabilities of the first two NN.…”

Section: Acoustic Modelmentioning

confidence: 99%

“…Research has also investigated the application of ASR to children's speech development, assessing autism spectrum disorders [13], childhood apraxia of speech [14], cleft lip and palate [15] and general language development [16]. Despite the increasing abundance of such research, the focus remains primarily on therapy rather than screening, with very little work addressing the need for a broadly applicable screening tool [16,17]. A key challenge which has constrained clinical ASR is the scarcity of data available for the target population.…”

Section: Introductionmentioning

confidence: 99%

“…In this paper, we investigate whether an approach utilising transfer learning combined with previously developed methods for constrained decoding based on knowledge of the target text [17], can be applied to improve the detection of phonological errors in children's speech. A large and readily available dataset of adult Australian speech is used to train a base DNN, which is then transferred to train our target DNN of disordered Australian children's speech.…”

Section: Introductionmentioning

confidence: 99%

“…A large and readily available dataset of adult Australian speech is used to train a base DNN, which is then transferred to train our target DNN of disordered Australian children's speech. Figure 1 presents the three stage assessment system (the 'proof of concept' which was developed in [17] and [22]). This paper presents improvements to the initial stage, whilst maintaining the novel constrained HMM decoder and phonological error pattern (PEP) detection stages from previous work.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Improving Child Speech Disorder Assessment by Incorporating Out-of-Domain Adult Speech

Smith¹,

Sneddon²,

Ward³

et al. 2017

Interspeech 2017

Self Cite

View full text Add to dashboard Cite

This paper describes the continued development of a system to provide early assessment of speech development issues in children and better triaging to professional services. Whilst corpora of children's speech are increasingly available, recognition of disordered children's speech is still a data-scarce task. Transfer learning methods have been shown to be effective at leveraging out-of-domain data to improve ASR performance in similar data-scarce applications. This paper combines transfer learning, with previously developed methods for constrained decoding based on expert speech pathology knowledge and knowledge of the target text. Results of this study show that transfer learning with out-of-domain adult speech can improve phoneme recognition for disordered children's speech. Specifically, a Deep Neural Network (DNN) trained on adult speech and finetuned on a corpus of disordered children's speech reduced the phoneme error rate (PER) of a DNN trained on a children's corpus from 16.3% to 14.2%. Furthermore, this fine-tuned DNN also improved the performance of a Hierarchal Neural Network based acoustic model previously used by the system with a PER of 19.3%. We close with a discussion of our planned future developments of the system.

show abstract

Section: Introductionsupporting

confidence: 55%

Section: Acoustic Modelmentioning

confidence: 99%