Abstract:In any speech coding system that adds noise to the speech signal, the primary goal should not be to reduce the noise power as much as possible, but to make the noise inaudible or to minimize its subjective loudness. ’’Hiding’’ the noise under the signal spectrum is feasible because of human auditory masking: sounds whose spectrum falls near the masking threshold of another sound are either completely masked by the other sound or reduced in loudness. In speech coding applications, the ’’other sound’’ is, of cou… Show more
“…To reduce the nonlinear relationship between the formant frequencies and the corresponding perceived approximant quality, all acoustic values were converted from Hz to Bark (Schroeder, Atal, & Hall, 1979) 7 :…”
Although previous research indicates that Japanese speakers' second-language (L2) perception and production of English /ɹ/ may improve with increased L2 experience, relatively little is known about the fine phonetic details of their /ɹ/ productions, especially during the early phase of L2 speech learning. This cross-sectional study examined acoustic properties of word-initial /ɹ/ from 60 Japanese learners with a length of residence (LOR) between one month and one year in Canada. Their performance was compared to that of 15 native speakers of English and 15 lowproficiency Japanese learners of English. Formant frequencies (F2 and F3) and F1 transition durations were evaluated under three task conditions-word reading, sentence reading, and timed picture description. Learners with as little as two to three months of residence demonstrated target-like F2 frequencies. In addition, increased LOR was predictive of more target-like transition durations. Although the learners showed some improvement in F3 as a function of LOR, they did so mainly at a controlled level of speech production. The findings suggest that during the early phase of L2 segmental development, production accuracy is taskdependent and is influenced by the availability of L1 phonetic cues for redeployment in L2.
“…To reduce the nonlinear relationship between the formant frequencies and the corresponding perceived approximant quality, all acoustic values were converted from Hz to Bark (Schroeder, Atal, & Hall, 1979) 7 :…”
Although previous research indicates that Japanese speakers' second-language (L2) perception and production of English /ɹ/ may improve with increased L2 experience, relatively little is known about the fine phonetic details of their /ɹ/ productions, especially during the early phase of L2 speech learning. This cross-sectional study examined acoustic properties of word-initial /ɹ/ from 60 Japanese learners with a length of residence (LOR) between one month and one year in Canada. Their performance was compared to that of 15 native speakers of English and 15 lowproficiency Japanese learners of English. Formant frequencies (F2 and F3) and F1 transition durations were evaluated under three task conditions-word reading, sentence reading, and timed picture description. Learners with as little as two to three months of residence demonstrated target-like F2 frequencies. In addition, increased LOR was predictive of more target-like transition durations. Although the learners showed some improvement in F3 as a function of LOR, they did so mainly at a controlled level of speech production. The findings suggest that during the early phase of L2 segmental development, production accuracy is taskdependent and is influenced by the availability of L1 phonetic cues for redeployment in L2.
Abstract:The accuracy of noise estimation is important for the performance of a speech denoising system. Most noise estimators suffer from either overestimation or underestimation on the noise level. An overestimate on noise magnitude will cause serious speech distortion for speech denoising. Conversely, a great quantity of residual noise will occur when the noise magnitude is underestimated. Accurately estimating noise magnitude is important for speech denoising. This study proposes employing variable segment length for noise tracking and variable thresholds for the determination of speech presence probability, resulting in the performance improvement for a minima-controlled-recursive-averaging (MCRA) algorithm in noise estimation. Initially, the fundamental frequency was estimated to determine whether a frame is a vowel. In the case of a vowel frame, the increment of segment lengths and the decrement of threshold for speech presence were performed which resulted in underestimating the level of noise magnitude. Accordingly, the speech distortion is reduced in denoised speech. On the contrary, the segment length decreases rapidly in noise-dominant regions. This enables the noise estimate to update quickly and the noise variation to track well, yielding interference noise being removed effectively through the process of speech denoising. Experimental results show that the proposed approach has been effective in improving the performance of the MCRA algorithm by preserving the weak vowels and consonants. The denoising performance is therefore improved.
“…Again, a tan-sigmoid function was used in the nodes of the hidden layers and a linear function in the output layer node. Two nodes were Since the first publication of this table in 1961, many function approximations of the data, with varying degrees of accuracy, have been presented [7], [18]and [9]. The current most widely used and accepted method for this conversion is outlined by Traunmuller in his paper "Analytical expressions for the Tonotopic Sensory Scale" [3].…”
Abstract-The human auditory system perceives sound in a much different manner than how sound is measured by modern audio sensing systems. The most commonly referenced aspects of auditory perception are loudness and pitch, which are related to the objective measures of audio signal frequency and sound pressure level. Here we describe an efficient and accurate method for the conversion of the sensed factors of frequency and sound pressure level to perceived loudness and pitch. This method is achieved through the modeling of the physical auditory system and the biological neural network of the primary auditory cortex using artificial neural networks. The behavior of artificial neural networks both during and after the training process has also been found to mimic that of biological neural networks and this method will be shown to have certain advantages over previous methods in the modeling of auditory perception. This work will describe the nature of artificial neural networks and investigate their suitability over other modeling methods for the task of perception modeling, taking into account development and implementation complexity. It will be shown that while known points on the perception scales of loudness and pitch can be used to objectively test the suitability of artificial neural networks, it is in the estimation of the perception of sound from the unknown (or unseen) data points that this method excels.Index terms: auditory system modeling, audio sensors, artificial neural networks, perception of sound, digital signal processing, loudness, pitch.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.