Vincent Karas scite author profile

The aim of this study was to characterise and compare widely used acquisition strategies for hyperpolarised (13)C imaging. Free induction decay chemical shift imaging (FIDCSI), echo-planar spectroscopic imaging (EPSI), IDEAL spiral chemical shift imaging (ISPCSI) and spiral chemical shift imaging (SPCSI) sequences were designed for two different regimes of spatial resolution. Their characteristics were studied in simulations and in tumour-bearing rats after injection of hyperpolarised [1-(13)C]pyruvate on a clinical 3-T scanner. Two or three different sequences were used on the same rat in random order for direct comparison. The experimentally obtained lactate signal-to-noise ratio (SNR) in the tumour matched the simulations. Differences between the sequences were mainly found in the encoding efficiency, gradient demand and artefact behaviour. Although ISPCSI and SPCSI offer high encoding efficiencies, these non-Cartesian trajectories are more prone than EPSI and FIDCSI to artefacts from various sources. If the encoding efficiency is sufficient for the desired application, EPSI has been proven to be a robust choice. Otherwise, faster spiral acquisition schemes are recommended. The conclusions found in this work can be applied directly to clinical applications.

show abstract

Group-level Speech Emotion Recognition Utilising Deep Spectrum Features

Ottl

Amiriparian

Gerczuk

et al. 2020

View full text Add to dashboard Cite

The objectives of this challenge paper are twofold: first, we apply a range of neural network based transfer learning approaches to cope with the data scarcity in the field of speech emotion recognition, and second, we fuse the obtained representations and predictions in an early and late fusion strategy to check the complementarity of the applied networks. In particular, we use our Deep Spectrum system to extract deep feature representations from the audio content of the 2020 EmotiW group level emotion prediction challenge data. We evaluate a total of ten ImageNet pre-trained Convolutional Neural Networks, including AlexNet, VGG16, VGG19 and three DenseNet variants as audio feature extractors. We compare their performance to the ComParE feature set used in the challenge baseline, employing simple logistic regression models trained with Stochastic Gradient Descent as classifiers. With the help of late fusion, our approach improves the performance on the test set from 47.88 % to 62.70 % accuracy. CCS CONCEPTS• Computing methodologies → Machine learning algorithms; Spectral methods;

show abstract

Face mask recognition from audio: The MASC database and an overview on the mask challenge

Mohamed

Nessiem

Batliner

et al. 2022

Pattern Recognition

View full text Add to dashboard Cite

The sudden outbreak of COVID-19 has resulted in tough challenges for the field of biometrics due to its spread via physical contact, and the regulations of wearing face masks. Given these constraints, voice biometrics can offer a suitable contact-less biometric solution; they can benefit from models that classify whether a speaker is wearing a mask or not. This article reviews the Mask Sub-Challenge (MSC) of the INTERSPEECH 2020 COMputational PARalinguistics challengE (ComParE), which focused on the following classification task: Given an audio chunk of a speaker, classify whether the speaker is wearing a mask or not. First, we report the collection of the Mask Augsburg Speech Corpus (MASC) and the baseline approaches used to solve the problem, achieving a performance of Unweighted Average Recall (UAR). We then summarise the methodologies explored in the submitted and accepted papers that mainly used two common patterns: (i) phonetic-based audio features, or (ii) spectrogram representations of audio combined with Convolutional Neural Networks (CNNs) typically used in image processing. Most approaches enhance their models by adapting ensembles of different models and attempting to increase the size of the training data using various techniques. We review and discuss the results of the participants of this sub-challenge, where the winner scored a UAR of . Moreover, we present the results of fusing the approaches, leading to a UAR of . Finally, we present a smartphone app that can be used as a proof of concept demonstration to detect in real-time whether users are wearing a face mask; we also benchmark the run-time of the best models.

show abstract

From Speech to Facial Activity: Towards Cross-modal Sequence-to-Sequence Attention Networks

Stappen

Karas

Cummins

et al. 2019

View full text Add to dashboard Cite

Unsupervised Representation Learning with Attention and Sequence to Sequence Autoencoders to Predict Sleepiness From Speech

Amiriparian

Winokurow

Karas

et al. 2020

View full text Add to dashboard Cite

Motivated by the attention mechanism of the human visual system and recent developments in the field of machine translation, we introduce our attention-based and recurrent sequence to sequence autoencoders for fully unsupervised representation learning from audio files. In particular, we test the efficacy of our novel approach on the task of speech-based sleepiness recognition. We evaluate the learnt representations from both autoencoders, and conduct an early fusion to ascertain possible complementarity between them. In our frameworks, we first extract Mel-spectrograms from raw audio. Second, we train recurrent autoencoders on these spectrograms which are considered as time-dependent frequency vectors. Afterwards, we extract the activations of specific fully connected layers of the autoencoders which represent the learnt features of spectrograms for the corresponding audio instances. Finally, we train support vector regressors on these representations to obtain the predictions. On the development partition of the data, we achieve Spearman's correlation coefficients of .324, .283, and .320 with the targets on the Karolinska Sleepiness Scale by utilising attention and non-attention autoencoders, and the fusion of both autoencoders' representations, respectively. In the same order, we achieve .311, .359, and .367 Spearman's correlation coefficients on the test data, indicating the suitability of our proposed fusion strategy. CCS CONCEPTS• Computing methodologies → Neural networks.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Vincent Karas

Comparison of acquisition schemes for hyperpolarised ¹³ C imaging

Group-level Speech Emotion Recognition Utilising Deep Spectrum Features

Face mask recognition from audio: The MASC database and an overview on the mask challenge

From Speech to Facial Activity: Towards Cross-modal Sequence-to-Sequence Attention Networks

Unsupervised Representation Learning with Attention and Sequence to Sequence Autoencoders to Predict Sleepiness From Speech

Contact Info

Product

Resources

About

Vincent Karas

Comparison of acquisition schemes for hyperpolarised 13 C imaging

Group-level Speech Emotion Recognition Utilising Deep Spectrum Features

Face mask recognition from audio: The MASC database and an overview on the mask challenge

From Speech to Facial Activity: Towards Cross-modal Sequence-to-Sequence Attention Networks

Unsupervised Representation Learning with Attention and Sequence to Sequence Autoencoders to Predict Sleepiness From Speech

Contact Info

Product

Resources

About

Comparison of acquisition schemes for hyperpolarised ¹³ C imaging