Uwe D. Reichel scite author profile

Highlights New software paradigm for linguistic/phonetic tools: webservices  Webservices encapsulating basic processing tools  Webservices as building blocks for complex systems  Web interface as front end to webservices or systems of webservices  BAS CLARIN webservices: a free service to the scientific community  Multilingual automatic segmentation and labelling of speech into words and phones  Multilingual automatic text-to-phoneme conversion webservice  Multilingual syllabification webservice  Free German speech synthesis webservice The services include automatic segmentation of speech, grapheme-to-phoneme conversion, syllabification, speech synthesis, and optimal symbol sequence alignment.

show abstract

Entrainment profiles: Comparison by gender, role, and feature set

Reichel

Beňuš

Mády

2018

Speech Communication

View full text Add to dashboard Cite

We examine prosodic entrainment in cooperative game dialogs for new feature sets describing register, pitch accent shape, and rhythmic aspects of utterances. For these as well as for established features we present entrainment profiles to detect within-and across-dialog entrainment by the speakers' gender and role in the game. It turned out, that feature sets undergo entrainment in different quantitative and qualitative ways, which can partly be attributed to their different functions. Furthermore, interactions between speaker gender and role (describer vs. follower) suggest gender-dependent strategies in cooperative solution-oriented interactions: female describers entrain most, male describers least. Our data suggests a slight advantage of the latter strategy on task success.

show abstract

Multistage linguistic conditioning of convolutional layers for speech emotion recognition

Triantafyllopoulos¹,

Reichel²,

Liu³

et al. 2021

Preprint

View full text Add to dashboard Cite

In this contribution, we investigate the effectiveness of deep fusion of text and audio features for categorical and dimensional speech emotion recognition (SER). We propose a novel, multistage fusion method where the two information streams are integrated in several layers of a deep neural network (DNN), and contrast it with a single-stage one where the streams are merged in a single point. Both methods depend on extracting summary linguistic embeddings from a pre-trained BERT model, and conditioning one or more intermediate representations of a convolutional model operating on log-Mel spectrograms. Experiments on the widely used IEMOCAP and MSP-Podcast databases demonstrate that the two fusion methods clearly outperform a shallow (late) fusion baseline and their unimodal constituents, both in terms of quantitative performance and qualitative behaviour. Our accompanying analysis further reveals a hitherto unexplored role of the underlying dialogue acts on unimodal and bimodal SER, with different models showing a biased behaviour across different acts. Overall, our multistage fusion shows better quantitative performance, surpassing all alternatives on most of our evaluations. This illustrates the potential of multistage fusion in better assimilating text and audio information.

show abstract

Assessing respiratory contributions to f0 declination in German across varying speech tasks and respiratory demands

Fuchs

Petrone

Rochet-Capellan

et al. 2015

Journal of Phonetics

View full text Add to dashboard Cite

show abstract

Comparing parameterizations of pitch register and its discontinuities at prosodic boundaries for Hungarian

Reichel¹,

Mády²

2014

View full text Add to dashboard Cite

We examined how well prosodic boundary strength can be captured by two declination stylization methods as well as by four different representations of pitch register. In the stylization proposed by Liebermann et al. (1985) base-and topline are fitted to peaks and valleys of the pitch contour, whereas in Rei-chel&Mády (2013) these lines are fitted to medians below and above certain pitch percentiles. From each of the stylizations four feature pools were induced representing different aspects of register discontinuity at word boundaries: discontinuities related to the base-, mid-, and topline, as well as to the range between base-and topline. Concerning stylization the medianbased fitting approach turned out to be more robust with respect to declination line crossing errors and yielded base-, topline and range-related discontinuity characteristics with higher correlations to perceived boundary strength. Concerning register representation, for the peak/valley fitting approach the base-and topline patterns showed weaker correspondences to boundary strength than the other feature pools. We furthermore trained generalized linear regression models for boundary strength prediction on each feature pool. It turned out that neither the stylization method nor the register representation had a significant influence on the overall good prediction performance.

show abstract

Linking bottom-up intonation stylization to discourse structure

Reichel¹

2014

Computer Speech & Language

View full text Add to dashboard Cite

Speaking Corona? Human and Machine Recognition of COVID-19 from Voice

Hecker

Pokorny

Bartl-Pokorny

et al. 2021

View full text Add to dashboard Cite

With the COVID-19 pandemic, several research teams have reported successful advances in automated recognition of COVID-19 by voice. Resulting voice-based screening tools for COVID-19 could support large-scale testing efforts. While capabilities of machines on this task are progressing, we approach the so far unexplored aspect whether human raters can distinguish COVID-19 positive and negative tested speakers from voice samples, and compare their performance to a machine learning baseline. To account for the challenging symptom similarity between COVID-19 and other respiratory diseases, we use a carefully balanced dataset of voice samples, in which COVID-19 positive and negative tested speakers are matched by their symptoms alongside COVID-19 negative speakers without symptoms. Both human raters and the machine struggle to reliably identify COVID-19 positive speakers in our dataset. These results indicate that particular attention should be paid to the distribution of symptoms across all speakers of a dataset when assessing the capabilities of existing systems. The identification of acoustic aspects of COVID-19-related symptom manifestations might be the key for a reliable voice-based COVID-19 detection in the future by both trained human raters and machine learning models.

show abstract

Parameterization and automatic labeling of Hungarian intonation

Reichel¹,

Markó²,

Mády³

2014

View full text Add to dashboard Cite

In Hungarian intonation research the goal of a common framework developed by Varga (2002; [1]) is to categorize the intonation within the domain of accent groups by character contours. We propose a linear parameterization of a subset of these contours derived from polynomial stylization. These parameters were used to train classification trees and support vector machines for contour prediction. Parameter extraction and training was carried out on the original F0 contours of spontaneous speech data as well as on three differently normalized variants suppressing fundamental frequency level and range effects. The highest accuracies were obtained for classification trees and F0 residuals after midline subtraction, but the overall performances were rather poor. Nevertheless, a significant improvement of the results was achieved by a Hidden Markov model to predict the correct label sequence from the partly erroneous classification output.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Uwe D. Reichel

Multilingual processing of speech via web services

Entrainment profiles: Comparison by gender, role, and feature set

Multistage linguistic conditioning of convolutional layers for speech emotion recognition

Assessing respiratory contributions to f0 declination in German across varying speech tasks and respiratory demands

Comparing parameterizations of pitch register and its discontinuities at prosodic boundaries for Hungarian

Linking bottom-up intonation stylization to discourse structure

Speaking Corona? Human and Machine Recognition of COVID-19 from Voice

Parameterization and automatic labeling of Hungarian intonation

Contact Info

Product

Resources

About