Optimizing digital speech coders by exploiting masking properties of the human ear

Schroeder, Manfred R.; Atal, Bishnu S.; Hall, J. L.

doi:10.1121/1.383662

Cited by 335 publications

(69 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…To reduce the nonlinear relationship between the formant frequencies and the corresponding perceived approximant quality, all acoustic values were converted from Hz to Bark (Schroeder, Atal, & Hall, 1979) 7 :…”

Section: Methodsmentioning

confidence: 99%

The Early Phase of /ɹ/ Production Development in Adult Japanese Learners of English

Saito

Munro

2014

Lang Speech

View full text Add to dashboard Cite

Although previous research indicates that Japanese speakers' second-language (L2) perception and production of English /ɹ/ may improve with increased L2 experience, relatively little is known about the fine phonetic details of their /ɹ/ productions, especially during the early phase of L2 speech learning. This cross-sectional study examined acoustic properties of word-initial /ɹ/ from 60 Japanese learners with a length of residence (LOR) between one month and one year in Canada. Their performance was compared to that of 15 native speakers of English and 15 lowproficiency Japanese learners of English. Formant frequencies (F2 and F3) and F1 transition durations were evaluated under three task conditions-word reading, sentence reading, and timed picture description. Learners with as little as two to three months of residence demonstrated target-like F2 frequencies. In addition, increased LOR was predictive of more target-like transition durations. Although the learners showed some improvement in F3 as a function of LOR, they did so mainly at a controlled level of speech production. The findings suggest that during the early phase of L2 segmental development, production accuracy is taskdependent and is influenced by the availability of L1 phonetic cues for redeployment in L2.

show abstract

Section: Methodsmentioning

confidence: 99%

The Early Phase of /ɹ/ Production Development in Adult Japanese Learners of English

Saito

Munro

2014

Lang Speech

View full text Add to dashboard Cite

show abstract

“…Detailed procedures for the computation of the NMT can be found in [19]. The denoised speech signal is obtained byŝ…”

Section: Speech Denoising Methodsmentioning

confidence: 99%

Estimation of Noise Magnitude for Speech Denoising Using Minima-Controlled-Recursive-Averaging Algorithm Adapted by Harmonic Properties

Lei

Shen

et al. 2016

Applied Sciences

View full text Add to dashboard Cite

Abstract:The accuracy of noise estimation is important for the performance of a speech denoising system. Most noise estimators suffer from either overestimation or underestimation on the noise level. An overestimate on noise magnitude will cause serious speech distortion for speech denoising. Conversely, a great quantity of residual noise will occur when the noise magnitude is underestimated. Accurately estimating noise magnitude is important for speech denoising. This study proposes employing variable segment length for noise tracking and variable thresholds for the determination of speech presence probability, resulting in the performance improvement for a minima-controlled-recursive-averaging (MCRA) algorithm in noise estimation. Initially, the fundamental frequency was estimated to determine whether a frame is a vowel. In the case of a vowel frame, the increment of segment lengths and the decrement of threshold for speech presence were performed which resulted in underestimating the level of noise magnitude. Accordingly, the speech distortion is reduced in denoised speech. On the contrary, the segment length decreases rapidly in noise-dominant regions. This enables the noise estimate to update quickly and the noise variation to track well, yielding interference noise being removed effectively through the process of speech denoising. Experimental results show that the proposed approach has been effective in improving the performance of the MCRA algorithm by preserving the weak vowels and consonants. The denoising performance is therefore improved.

show abstract

“…Again, a tan-sigmoid function was used in the nodes of the hidden layers and a linear function in the output layer node. Two nodes were Since the first publication of this table in 1961, many function approximations of the data, with varying degrees of accuracy, have been presented [7], [18]and [9]. The current most widely used and accepted method for this conversion is outlined by Traunmuller in his paper "Analytical expressions for the Tonotopic Sensory Scale" [3].…”

Section: Training / Testingmentioning

confidence: 99%

The Use of Artificial Neural Networks in the estimation of the Perception of Sound By the Human Auditory System

Riordan

Doody

Walsh

2015

International Journal on Smart Sensing and Intelligent Systems

View full text Add to dashboard Cite

Abstract-The human auditory system perceives sound in a much different manner than how sound is measured by modern audio sensing systems. The most commonly referenced aspects of auditory perception are loudness and pitch, which are related to the objective measures of audio signal frequency and sound pressure level. Here we describe an efficient and accurate method for the conversion of the sensed factors of frequency and sound pressure level to perceived loudness and pitch. This method is achieved through the modeling of the physical auditory system and the biological neural network of the primary auditory cortex using artificial neural networks. The behavior of artificial neural networks both during and after the training process has also been found to mimic that of biological neural networks and this method will be shown to have certain advantages over previous methods in the modeling of auditory perception. This work will describe the nature of artificial neural networks and investigate their suitability over other modeling methods for the task of perception modeling, taking into account development and implementation complexity. It will be shown that while known points on the perception scales of loudness and pitch can be used to objectively test the suitability of artificial neural networks, it is in the estimation of the perception of sound from the unknown (or unseen) data points that this method excels.Index terms: auditory system modeling, audio sensors, artificial neural networks, perception of sound, digital signal processing, loudness, pitch.

show abstract

Optimizing digital speech coders by exploiting masking properties of the human ear

Cited by 335 publications

References 0 publications

The Early Phase of /ɹ/ Production Development in Adult Japanese Learners of English

The Early Phase of /ɹ/ Production Development in Adult Japanese Learners of English

Estimation of Noise Magnitude for Speech Denoising Using Minima-Controlled-Recursive-Averaging Algorithm Adapted by Harmonic Properties

The Use of Artificial Neural Networks in the estimation of the Perception of Sound By the Human Auditory System

Contact Info

Product

Resources

About