Acoustics-based automatic assessment is a highly desirable approach to detecting speech sound disorder (SSD) in children. The performance of an automatic speech assessment system depends greatly on the availability of a good amount of properly annotated disordered speech, which is a critical problem particularly for child speech. This paper presents a novel design of child speech disorder detection system that requires only normal speech for model training. The system is based on a Siamese recurrent network, which is trained to learn the similarity and discrepancy of pronunciations between a pair of phones in the embedding space. For detection of speech sound disorder, the trained network measures a distance that contrasts the test phone to the desired phone and the distance is used to train a binary classifier. Speech attribute features are incorporated to measure the pronunciation quality and provide diagnostic feedback. Experimental results show that Siamese recurrent network with a combination of speech attribute features and phone posterior features could attain an optimal detection accuracy of 0.941.
Summary
Polyethylene glycol (PEG) is a kind of phase change material with high phase change enthalpy and good compatibility with the environment. However, there is relatively large supercooling for PEG, limiting their practical applications. To reduce their supercooling degree, herein we use three different small molecules (acryloyl chloride, acetyl chloride, and thionyl chloride) to modify PEG and study the effects of different end group modification on their phase change properties. Fourier‐transform infrared (FTIR) and proton nuclear magnetic resonance (1H‐NMR) spectroscopy are used to confirm the molecular structure of the PEG with different molecular weights and functionalities after chemical modification. Crystal structures of the PEG before and after the modification are verified by X‐ray diffractometry (XRD) method and show no change. Differential scanning calorimetry (DSC) results show that the end group modification is quite effective for mitigating the supercooling of PEG with double ‐OH end groups but not effective for PEG with mono ‐OH end group. The mechanism for the change of supercooling behavior is proposed. The costs of different modification methods are estimated and compared.
This paper describes an investigation on automatic speech assessment for people with aphasia (PWA) using a DNN based automatic speech recognition (ASR) system. The main problems being addressed are the lack of training speech in the intended application domain and the relevant degradation of ASR performance for impaired speech of PWA. We adopt the TDNN-BLSTM structure for acoustic modeling and apply the technique of multi-task learning with large amount of domainmismatched data. This leads to a significant improvement on the recognition accuracy, as compared with a conventional single-task learning DNN system. To facilitate the extraction of robust text features for quantifying language impairment in PWA speech, we propose to incorporate N-best hypotheses and confusion network representation of the ASR output. The severity of impairment is predicted from text features and suprasegmental duration features using different regression models. Experimental results show a high correlation of 0.842 between the predicted severity level and the subjective Aphasia Quotient score.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.