Peter Uhrig scite author profile

In the field of neurobiology of language, neuroimaging studies are generally based on stimulation paradigms consisting of at least two different conditions. Designing those paradigms can be very time-consuming and this traditional approach is necessarily data-limited. In contrast, in computational and corpus linguistics, analyses are often based on large text corpora, which allow a vast variety of hypotheses to be tested by repeatedly re-evaluating the data set.Furthermore, text corpora also allow exploratory data analysis in order to generate new hypotheses. By drawing on the advantages of both fields, neuroimaging and computational corpus linguistics, we here present a unified approach combining continuous natural speech and MEG to generate a corpus of speech-evoked neuronal activity.

show abstract

Toward an infrastructure for data-driven multimodal communication research

Steen

Hougaard

Joo

et al. 2018

View full text Add to dashboard Cite

Research into the multimodal dimensions of human communication faces a set of distinctive methodological challenges. Collecting the datasets is resource-intensive, analysis often lacks peer validation, and the absence of shared datasets makes it difficult to develop standards. External validity is hampered by small datasets, yet large datasets are intractable. Red Hen Lab spearheads an international infrastructure for data-driven multimodal communication research, facilitating an integrated cross-disciplinary workflow. Linguists, communication scholars, statisticians, and computer scientists work together to develop research questions, annotate training sets, and develop pattern discovery and machine learning tools that handle vast collections of multimodal data, beyond the dreams of previous researchers. This infrastructure makes it possible for researchers at multiple sites to work in real-time in transdisciplinary teams. We review the vision, progress, and prospects of this research consortium.

show abstract

SoMaJo: State-of-the-art tokenization for German web and social media texts

Proisl

Uhrig

2016

View full text Add to dashboard Cite

show abstract

LDL-AURIS: A computational model, grounded in error-driven learning, for the comprehension of single spoken words

Shafaei-Bajestan¹,

Moradipour-Tari²,

Uhrig³

et al. 2020

Preprint

View full text Add to dashboard Cite

A computational model for auditory word recognition is presented that enhances the model of Arnold et al. (2017). Real-valued features extracted from the speech signal instead of discrete features. One-hot encoding for words’ meanings is replaced by real-valued semantic vectors, adding a small amount of noise to safeguard discriminability. Instead of learning with Rescorla-Wagner updating, we use multivariate multiple regression, which captures discrimination learning in the limit of experience. These new design features substantially improve prediction accuracy for words extracted from spontaneous conversations. They also provide enhanced temporal granularity, enabling the modeling of cohort-like effects. Clustering with t-SNE shows that the acoustic form space captures phone-like similarities and differences. Thus, wide learning with high-dimensional vectors and no hidden layers, and no abstract mediating phone-like representations is not only possible but also achieves excellent performance that approximates the lower bound of human accuracy on the challenging task of isolated word recognition.

show abstract

LDL-AURIS: a computational model, grounded in error-driven learning, for the comprehension of single spoken words

Shafaei-Bajestan

Moradipour-Tari

Uhrig

et al. 2021

Language, Cognition and Neuroscience

View full text Add to dashboard Cite

A computational model for the comprehension of single spoken words is presented that builds on an earlier model using discriminative learning. Real-valued features are extracted from the speech signal instead of discrete features. Vectors representing word meanings using one-hot encoding are replaced by real-valued semantic vectors. Instead of incremental learning with Rescorla-Wagner updating, we use linear discriminative learning, which captures incremental learning at the limit of experience. These new design features substantially improve prediction accuracy for unseen words, and provide enhanced temporal granularity, enabling the modelling of cohortlike effects. Visualisation with t-SNE shows that the acoustic form space captures phone-like properties. Trained on 9 h of audio from a broadcast news corpus, the model achieves recognition performance that approximates the lower bound of human accuracy in isolated word recognition tasks. LDL-AURIS thus provides a mathematically-simple yet powerful characterisation of the comprehension of single words as found in English spontaneous speech.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Peter Uhrig

Analysis of continuous neuronal activity evoked by natural speech with computational corpus linguistics methods

Toward an infrastructure for data-driven multimodal communication research

SoMaJo: State-of-the-art tokenization for German web and social media texts

LDL-AURIS: A computational model, grounded in error-driven learning, for the comprehension of single spoken words

LDL-AURIS: a computational model, grounded in error-driven learning, for the comprehension of single spoken words

Contact Info

Product

Resources

About