Voice activity detection with quasi-quadrature filters and GMM decomposition for speech and noise

Makowski, Ryszard; Hossa, R.

doi:10.1016/j.apacoust.2020.107344

Cited by 9 publications

(3 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Semantic analysis procedures in radio content mainly involve voice detection, speech recognition and speaker identification tasks. Machine learning approaches based on clustering techniques that determine speech/non-speech frames were implemented for voice activity detection via Gaussian mixture models, Laplacian similarity matrices, expectation maximization algorithms, hidden Markov chains and artificial neural networks [26][27][28][29][30]. A more specific and interesting audio pattern that can be detected in audio signals, i.e., in broadcast programs, refers to phone line voices, due to the contained particular spectral audio properties [1,25,26].…”

Section: Background Work and Problem Definitionmentioning

confidence: 99%

Extending Radio Broadcasting Semantics through Adaptive Audio Segmentation Automations

Kotsakis

Dimoulas

2022

Knowledge

View full text Add to dashboard Cite

The present paper focuses on adaptive audio detection, segmentation and classification techniques in audio broadcasting content, dedicated mainly to voice data. The suggested framework addresses a real case scenario encountered in media services and especially radio streams, aiming to fulfill diverse (semi-) automated indexing/annotation and management necessities. In this context, aggregated radio content is collected, featuring small input datasets, which are utilized for adaptive classification experiments, without searching, at this point, for a generic pattern recognition solution. Hierarchical and hybrid taxonomies are proposed, firstly to discriminate voice data in radio streams and thereafter to detect single speaker voices, and when this is the case, the experiments proceed into a final layer of gender classification. It is worth mentioning that stand-alone and combined supervised and clustering techniques are tested along with multivariate window tuning, towards the extraction of meaningful results based on overall and partial performance rates. Furthermore, the current work via data augmentation mechanisms contributes to the formulation of a dynamic Generic Audio Classification Repository to be subjected, in the future, to adaptive multilabel experimentation with more sophisticated techniques, such as deep architectures.

show abstract

Section: Background Work and Problem Definitionmentioning

confidence: 99%

Extending Radio Broadcasting Semantics through Adaptive Audio Segmentation Automations

Kotsakis

Dimoulas

2022

Knowledge

View full text Add to dashboard Cite

show abstract

“…There has been a set of research works on machine learning approaches applied to the broad voice analysis research area such as pathological voice detection 25,26 , voice activity detection 27,28 . A number of studies have investigated voice analysis based on specific machine learning algorithms such as decision trees 29 , support vector machine (SVM) 30,31 , hidden Markov model (HMM) 32,33 , Gaussian mixture model (GMM) 34,35 , artificial neural networks (ANN) 36,37 and have reported high accuracy and performance [38][39][40] .…”

Section: Introductionmentioning

confidence: 99%

Towards the Objective Speech Assessment of Smoking Status based on Voice Features: A Review of the Literature

Bullen

Chu

et al. 2023

Journal of Voice

View full text Add to dashboard Cite

Background and Objective: In smoking cessation clinical research and practice, objective validation of self-reported smoking status is crucial for ensuring the reliability of the primary outcome, that is, smoking abstinence. Speech signals convey important information about a speaker, such as age, gender, body size, emotional state, and health state. We investigated (1) if smoking could measurably alter voice features, (2) if smoking cessation could lead to changes in voice, and therefore (3) if the voice-based smoking status assessment has the potential to be used as an objective smoking cessation validation method.Methods: A systematic review of the scientific literature was conducted to compile studies on smoking status assessment based on voice features. We searched nine scientific databases for original studies involving the effects of smoking on voice features, the effects of smoking cessation on voice features.Results: A total of 34 studies were identified for review. We found that fundamental frequency, jitter, shimmer, harmonics to noise ratio, and other voice features are affected by smoking and could be used to assess smoking status. Conclusion:Speech assessment of smoking status based on voice features has potential as a smoking status validation method, as it is simple, reliable, and less time-consuming. Furthermore, this study provides recommendations for future research on the objective speech assessment of smoking status based on voice features.

show abstract

“…Finally, the VAD decision is based on a threshold derived from the parameter contour at each utterance. This method is also used in [15], which proposed changing envelope calculation and utilized histograms and estimators of probability distributions to determine the detection threshold; citing that the SFF method is also used in the enhancement of speech intelligibility [16].…”

Section: Introductionmentioning

confidence: 99%

A robust voice activity detection based on single frequency filtering approach and the fractal dimension

Abajaddi

Elfahm

Ali

et al. 2022

Preprint

View full text Add to dashboard Cite

Speech activity detection is a crucial preprocessing step, in many scientific fields, such as speech recognition, audio forensics, audio conferencing, and text-to-speech applications. It can be used in speech processing to deactivate various operations during non-speaking sections. This paper proposes an accurate and robust approach that aims to classify voiced and unvoiced segments. For this purpose, novel algorithms are adopted that combine two approaches, the fractal dimension for the envelopes, these envelopes are obtained by the novel approach single frequency filtering, with high temporal and spectral resolution. To make a simple and fast decision between speech and non-speech segments, the fractal dimension is computed using the Katz algorithm. This parameter has shown its effectiveness in speech activity detection in continuous speech, but this work improves its performance notably. Two different corpora, The Texas Instruments Massachusetts Institute of Technology and the King Saud University Arabic speech, are used to assess the performance of the proposed method. The results of the proposed method show a reliable performance compared with well-known methods. The proposed approach can separate speech and non-speech segments in noisy and clean speech and does not need training data or any assumption at the beginning of the audio of the non-speech.

show abstract

Voice activity detection with quasi-quadrature filters and GMM decomposition for speech and noise

Cited by 9 publications

References 18 publications

Extending Radio Broadcasting Semantics through Adaptive Audio Segmentation Automations

Extending Radio Broadcasting Semantics through Adaptive Audio Segmentation Automations

Towards the Objective Speech Assessment of Smoking Status based on Voice Features: A Review of the Literature

A robust voice activity detection based on single frequency filtering approach and the fractal dimension

Contact Info

Product

Resources

About