Tone recognition is the core function in Chinese speech perception. The tone perception ability of people with sensorineural hearing loss (SNHL) is often weaker than normal people. Automatically tone enhancement would be useful in helping them understand Chinese speech better. In this paper, we focus on the tone enhancing model for Chinese disyllable words. We first analyze the acoustic features related to tone perception. By agglomerative hierarchical clustering method, the first and second syllables of disyllable words are clustered into 6 clusters respectively. Discriminative features of these clusters are experimentally determined from a set of possible features related to tone perception, such as the pitch value, pitch range and position of minimum pitch, etc. We further propose a practicable tone enhancing model with these discriminative features: 1) an input pitch contour is classified by calculating the distance between it and the centroid of each cluster, and 2) selecting the smallest distance, then the unclassified pitch contour belongs to this cluster, 3) the pitch contour is modified for tone enhancement with model parameters corresponding to this cluster using TD-PSOLA. Both statistical and subjective experiments show that higher hit rate of tone recognition can be obtained after tone enhancement with the proposed model. Especially, the proposed enhancing model can also avoid traditional tone recognition, which is more convictive and less laborious.
How to select effective emotional features are important for improving the performance of automatic speech emotion recognition. Although various feature dimension reduction algorithms were put forward that could help gain the accuracy rate of emotion distinction, but most of them exist various defects, such as high negative impact of the recognition rate, high computational complexity. Regarding this, two dimension reduction algorithms based on PCA (principal component analysis) and KPCA (Kernel-PCA) were comparatively discussed in this paper. The original features extracted from databases were transformed by PCA or KPCA. The weights of these new features over the transforming matrix were calculated and ranked, based on which features were chosen. Experimental results show that feature dimension reduction can make principal contribution to the accuracy of speech emotion recognition, and KPCA slightly outperforms PCA on the hit rate and the remaining dimensions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.