The Lombard effect is one of the most well-known effects of noise on speech production. Speech with the Lombard effect is more easily recognizable in noisy environments than normal natural speech. Our previous investigations showed that speech synthesis models might retain Lombard-effect characteristics. In this study, we investigate several speech models, such as harmonic, source-filter, and sinusoidal, applied to Lombard speech in the context of speech enhancement. For this purpose, 100 utterances of natural speech, and 100 with the Lombard effect induced are used. The goal of this study is to check to what extent speech utterances based on these models are recognizable and at what SNR (Signal-to-Noise Ratio) level threshold a particular model stops working. For this purpose, the synthesized models and Lombard speech are mixed with babble speech and street noise recordings with different SNRs. The quality of these models is measured, employing objective indicators as well as subjective tests. Since there is no standardized measure to apply to enhanced speech, an objective measure of assessing the speech quality of a model synthesizing Lombard speech characteristics, based on a feature vector, is proposed. Our approach is then compared with the standardized metric used in telecommunications as well as with subjective test results. The experimental investigations show the superiority of the source-filter models applied to synthesize Lombard speech over other models utilized. Also, the measure proposed correlates more closely with the results of the subjective evaluation than the outcomes from the ITU-T P.563 recommendation. This was checked with a ANOVA statistical analysis.
The aim of the work is to present a method of intelligent modification of the speech signal with speech features expressed in noise, based on the Lombard effect. The recordings utilized sets of words and sentences as well as disturbing signals, i.e., pink noise and the so-called babble speech. Noise signal, calibrated to various levels at the speaker's ears, was played over two loudspeakers located 2 m away from the speaker. In addition, the recording session included utterances in quiet, which constitute a reference to the received speech signal analysis with the Lombard effect. As a part of the analysis, the following parameters were examined with regard to prosody: fundamental frequency F0, formant frequencies of F1 and F2, duration of the utterance, sound intensity, etc., taking into account individual sentences, words, and vowels. The PRAAT program was used to process and analyze speech signals. Next, a method for modifying speech with the features of speech spoken in noise was proposed. Subsequent analyzes have shown that noisy speech modified by the Lombard effect features is characterized by higher values of the PESQ (perceptual evaluation of speech quality) speech quality indicator compared to noisy speech without the features incorporated.
This paper aims to propose a noise profiling method that can be performed in near real time based on machine learning (ML). To address challenges related to noise profiling effectively, we start with a critical review of the literature background. Then, we outline the experiment performed consisting of two parts. The first part concerns the noise recognition model built upon several baseline classifiers and noise signal features derived from the Aurora noise dataset. This is to select the best-performing classifier in the context of noise profiling. Therefore, a comparison of all classifier outcomes is shown based on effectiveness metrics. Also, confusion matrices prepared for all tested models are presented. The second part of the experiment consists of selecting the algorithm that scored the best, i.e., Naive Bayes, resulting in an accuracy of 96.76%, and using it in a noise-type recognition model to demonstrate that it can perform in a stable way. Classification results are derived from the real-life recordings performed in momentary and averaging modes. The key contribution is discussed regarding speech intelligibility improvements in the presence of noise, where identifying the type of noise is crucial. Finally, conclusions deliver the overall findings and future work directions.
The study aims to present a method of evaluating speech quality in noise and interference conditions based on similarity matrices. For that purpose, sets of words recorded in the presence of disturbing signals along with their clean counterparts are used. First, it is checked to what extent a correct alignment of signals with and without the Lombard effect is important when self-similarity matrices are utilized in the process of feature discerning. Then, self-similarity matrices based on the acoustic parameters concerning speech prosody and alignment are built. Next, a correlation check is performed for feature redundancy. Based on a reduced set of parameters, 2D maps of acoustic features are created for visualization purposes. This is also performed as the cross-check between different languages with and without Lombard speech. Analyses performed shown that self-similarity matrices may be applied for differentiating speech modified by the Lombard effect and non-Lombard utterances. This research is funded by the European Social Fund under the No 09.3.3-LMT-K-712 “Development of Competences of Scientists, other Researchers and Students through Practical Research Activities” measure.
The aim of this study is to address the question of whether the Lombard effect, either natural (an involuntary tendency to raise the uttered speech level in the presence of background noise) or synthetically created, is a remedy for speech communication in noise. To this end, a series of experiments to examine the interference of different noises in synthesizing the Lombard effect is performed. Several steps are proposed; first, a recording session is carried out to prepare a dataset of speech with and without the Lombard effect in a controlled environment. Then, we detect frequency changes at each time point on the 2D speech representation. Determining frequency tracks in a speech signal is performed using McAulay and Quartieri algorithm. To quantify the effect of noise on speech, containing the Lombard effect, an average formant track error is calculated as an objective image quality metric. Three image assessment measures, i.e., SSIM (Structural SIMilarity) index, RMSE (Root Mean Square Error), and dHash (Difference Hash), are employed for that purpose. Moreover, several spectral descriptors are analyzed in the context of Lombard speech and various types of noise to discuss their influence on speech. The investigations are concluded with an initial attempt at automatic noise profiling based on the method developed, followed by pitch modifications of neutral speech signal depending on the profiling result and frequency change trends obtained. An overlap-add synthesis in the STRAIGHT vocoder is used for synthesized speech.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.