Evaluation of Lombard Speech Models in the Context of Speech in Noise Enhancement

Korvel, Gražina; Kąkol, Krzysztof; Kurasova, Olga; Kostek, Bożena

doi:10.1109/access.2020.3015421

Cited by 12 publications

(5 citation statements)

References 42 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Since the discovery of LE, this phenomenon has been extensively studied by a wide range of specialists to find solutions to improve the performance of automatic speech recognition systems in noisy environments (Maheswari et al, 2021) or increase speech intelligibility by converting the speaking style from normal to Lombard speech (Li et al, 2020, Kąkol et al, 2020. Also, the basic idea was that LE might be applied to speech synthesizers, allowing them to adapt to noisy conditions , Paul et al, 2020.…”

Section: Introductionmentioning

confidence: 99%

Lombard effect – is it a remedy for speech communication in noisy environments?

Korvel

Kąkol

Treigys

et al. 2023

Preprint

View full text Add to dashboard Cite

The aim of this study is to address the question of whether the Lombard effect, either natural (an involuntary tendency to raise the uttered speech level in the presence of background noise) or synthetically created, is a remedy for speech communication in noise. To this end, a series of experiments to examine the interference of different noises in synthesizing the Lombard effect is performed. Several steps are proposed; first, a recording session is carried out to prepare a dataset of speech with and without the Lombard effect in a controlled environment. Then, we detect frequency changes at each time point on the 2D speech representation. Determining frequency tracks in a speech signal is performed using McAulay and Quartieri algorithm. To quantify the effect of noise on speech, containing the Lombard effect, an average formant track error is calculated as an objective image quality metric. Three image assessment measures, i.e., SSIM (Structural SIMilarity) index, RMSE (Root Mean Square Error), and dHash (Difference Hash), are employed for that purpose. Moreover, several spectral descriptors are analyzed in the context of Lombard speech and various types of noise to discuss their influence on speech. The investigations are concluded with an initial attempt at automatic noise profiling based on the method developed, followed by pitch modifications of neutral speech signal depending on the profiling result and frequency change trends obtained. An overlap-add synthesis in the STRAIGHT vocoder is used for synthesized speech.

show abstract

Section: Introductionmentioning

confidence: 99%

Lombard effect – is it a remedy for speech communication in noisy environments?

Korvel

Kąkol

Treigys

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…In contrast, the LE may create problems when detecting speech in noise automatically, but not trained on data related to the LE (Vlaj and Kacic, 2011;Marxer et al, 2018;Korvel et al, 2020;Maheswari et al, 2020). Such hyper-articulation impairs the performance of the speech recognition systems (Maheswari et al, 2020), so it is essential to train them on data, including Lombard-related features.…”

Section: Introductionmentioning

confidence: 99%

Investigation of the Lombard effect based on a machine learning approach

Korvel,

Treigys,

Kąkol

et al. 2023

International Journal of Applied Mathematics and Computer Scien

View full text Add to dashboard Cite

The Lombard effect is an involuntary increase in the speaker's pitch, intensity, and duration in the presence of noise. It makes it possible to communicate in noisy environments more effectively. This study aims to investigate an efficient method for detecting the Lombard effect in uttered speech. The influence of interfering noise, room type, and the gender of the person on the detection process is examined. First, acoustic parameters related to speech changes produced by the Lombard effect are extracted. Mid-term statistics are built upon the parameters and used for the self-similarity matrix construction. They constitute input data for a convolutional neural network (CNN). The self-similarity-based approach is then compared with two other methods, i.e., spectrograms used as input to the CNN and speech acoustic parameters combined with the k-nearest neighbors algorithm. The experimental investigations show the superiority of the self-similarity approach applied to Lombard effect detection over the other two methods utilized. Moreover, small standard deviation values for the selfsimilarity approach prove the resulting high accuracies.

show abstract

“…However, it should be remembered that the best judge of speech intelligibility is the human ear. Therefore, there are many attempts to find a correlation between objective measurement results and subjective evaluation [5][6][7][8][9]. In general, the STI value can be determined using two ways, i.e., the direct method based on modulated signals or the indirect method based on the impulse response, according to IEC 60268-16 standard [10].…”

Section: Introductionmentioning

confidence: 99%

A Novel Method for Intelligibility Assessment of Nonlinearly Processed Speech in Spaces Characterized by Long Reverberation Times

Kurowski

Kotus

Odya

et al. 2022

Sensors

Self Cite

View full text Add to dashboard Cite

Objective assessment of speech intelligibility is a complex task that requires taking into account a number of factors such as different perception of each speech sub-bands by the human hearing sense or different physical properties of each frequency band of a speech signal. Currently, the state-of-the-art method used for assessing the quality of speech transmission is the speech transmission index (STI). It is a standardized way of objectively measuring the quality of, e.g., an acoustical adaptation of conference rooms or public address systems. The wide use of this measure and implementation of this method on numerous measurement devices make STI a popular choice when the speech-related quality of rooms has to be estimated. However, the STI measure has a significant drawback which excludes it from some particular use cases. For instance, if one would like to enhance speech intelligibility by employing a nonlinear digital processing algorithm, the STI method is not suitable to measure the impact of such an algorithm, as it requires that the measurement signal should not be altered in a nonlinear way. Consequently, if a nonlinear speech enhancing algorithm has to be tested, the STI—a standard way of estimating speech transmission cannot be used. In this work, we would like to propose a method based on the STI method but modified in such a way that it makes it possible to employ it for the estimation of the performance of the nonlinear speech intelligibility enhancement method. The proposed approach is based upon a broadband comparison of cumulated energy of the transmitted envelope modulation and the received modulation, so we called it broadband STI (bSTI). Its credibility with regard to signals altered by the environment or nonlinear speech changed by a DSP algorithm is checked by performing a comparative analysis of ten selected impulse responses for which a baseline value of STI was known.

show abstract

Evaluation of Lombard Speech Models in the Context of Speech in Noise Enhancement

Cited by 12 publications

References 42 publications

Lombard effect – is it a remedy for speech communication in noisy environments?

Lombard effect – is it a remedy for speech communication in noisy environments?

Investigation of the Lombard effect based on a machine learning approach

A Novel Method for Intelligibility Assessment of Nonlinearly Processed Speech in Spaces Characterized by Long Reverberation Times

Contact Info

Product

Resources

About