Ensembling End-to-End Deep Models for Computational Paralinguistics Tasks: ComParE 2020 Mask and Breathing Sub-Challenges

Markitantov, Maxim; Dresvyanskiy, Denis; Mamontov, Danila; Kaya, Heysem; Minker, Wolfgang; Karpov, Alexey

doi:10.21437/interspeech.2020-2666

Cited by 25 publications

(26 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…In an other effort of correlating speech signals with breathing signals, an ensemble system with fusion at both feature and decision level of two approaches is presented by Markitantov et al. [15] . One of the two approaches is a 1D-CNN based end-to-end model having two LSTM layers stacked above it.…”

Section: Introductionmentioning

confidence: 99%

“…The attention step is found to improve the metrics by 0.003 r-value absolute, from r = 0.728 to r = 0.731, and.726 % F1 value absolute, from 74.743 to 75.469 for the two tasks, respectively. All the three studies mentioned above [14] , [15] , [16] worked with the data set provided in the Breathing Sub-challenge of Interspeech 2020 ComParE [13] .…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

AI-Based human audio processing for COVID-19: A comprehensive overview

Deshpande

Batliner

Schuller

2022

Pattern Recognition

View full text Add to dashboard Cite

The Coronavirus (COVID-19) pandemic impelled several research efforts, from collecting COVID-19 patients’ data to screening them for virus detection. Some COVID-19 symptoms are related to the functioning of the respiratory system that influences speech production; this suggests research on identifying markers of COVID-19 in speech and other human generated audio signals. In this article, we give an overview of research on human audio signals using ’Artificial Intelligence’ techniques to screen, diagnose, monitor, and spread the awareness about COVID-19. This overview will be useful for developing automated systems that can help in the context of COVID-19, using non-obtrusive and easy to use bio-signals conveyed in human non-speech and speech audio productions.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

AI-Based human audio processing for COVID-19: A comprehensive overview

Deshpande

Batliner

Schuller

2022

Pattern Recognition

View full text Add to dashboard Cite

show abstract

“…Markitantov et al. [58] submitted five different models to the MSC. These models are all based on two models, ResNet18v1 and ResNet18v2, which are variations of the standard ResNet18 [21] .…”

Section: Challenge Results and Contributionsmentioning

confidence: 99%

“…The approaches introduced in [58] are all generic audio-based approaches that depend on variations of the standard ResNet18 model. As such, they can easily be used for other audio tasks without much change.…”

Section: Challenge Results and Contributionsmentioning

confidence: 99%

“… [57] 77.5 22.7 22.4 CNNs pretrained on AudioSet, Mixup, snapshots during training 6 Markitantov et al. [58] 75.9 16.4 31.9 Ensemble of ResNet18 variants, with

-folds and different optimisers 9 Klumpp et al. [63] 75.4 21.8 27.4 RNN for phoneme recognition 11 Yang et al.…”

Section: Challenge Results and Contributionsmentioning

confidence: 99%

See 1 more Smart Citation

Face mask recognition from audio: The MASC database and an overview on the mask challenge

Mohamed

Nessiem

Batliner

et al. 2022

Pattern Recognition

View full text Add to dashboard Cite

The sudden outbreak of COVID-19 has resulted in tough challenges for the field of biometrics due to its spread via physical contact, and the regulations of wearing face masks. Given these constraints, voice biometrics can offer a suitable contact-less biometric solution; they can benefit from models that classify whether a speaker is wearing a mask or not. This article reviews the Mask Sub-Challenge (MSC) of the INTERSPEECH 2020 COMputational PARalinguistics challengE (ComParE), which focused on the following classification task: Given an audio chunk of a speaker, classify whether the speaker is wearing a mask or not. First, we report the collection of the Mask Augsburg Speech Corpus (MASC) and the baseline approaches used to solve the problem, achieving a performance of Unweighted Average Recall (UAR). We then summarise the methodologies explored in the submitted and accepted papers that mainly used two common patterns: (i) phonetic-based audio features, or (ii) spectrogram representations of audio combined with Convolutional Neural Networks (CNNs) typically used in image processing. Most approaches enhance their models by adapting ensembles of different models and attempting to increase the size of the training data using various techniques. We review and discuss the results of the participants of this sub-challenge, where the winner scored a UAR of . Moreover, we present the results of fusing the approaches, leading to a UAR of . Finally, we present a smartphone app that can be used as a proof of concept demonstration to detect in real-time whether users are wearing a face mask; we also benchmark the run-time of the best models.

show abstract

AI Hears Your Health: Computer Audition for Health Monitoring

Amiriparian

Schuller

2021

ICT for Health, Accessibility and Wellbeing

View full text Add to dashboard Cite

Ensembling End-to-End Deep Models for Computational Paralinguistics Tasks: ComParE 2020 Mask and Breathing Sub-Challenges

Cited by 25 publications

References 18 publications

AI-Based human audio processing for COVID-19: A comprehensive overview

AI-Based human audio processing for COVID-19: A comprehensive overview

Face mask recognition from audio: The MASC database and an overview on the mask challenge

AI Hears Your Health: Computer Audition for Health Monitoring

Contact Info

Product

Resources

About