Hadrien Titeux scite author profile

We introduce pyannote.audio, an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines. pyannote.audio also comes with pre-trained models covering a wide range of domains for voice activity detection, speaker change detection, overlapped speech detection, and speaker embedding -reaching state-of-the-art performance for most of them.

show abstract

Phonemizer: Text to Phones Transcription for Multiple Languages in Python

Bernard¹,

Titeux²

2021

JOSS

View full text Add to dashboard Cite

Can statistical learning bootstrap early language acquisition? A modeling investigation

Lavechin¹,

Seyssel²,

Titeux³

et al. 2022

Preprint

View full text Add to dashboard Cite

Before they even produce their first word, infants start developing a language-specific perception, recognize the auditory form of frequent words, and develop a rudimentary knowledge of grammatical categories. A major question in language development is: what mechanisms are responsible for the effortless learning infants demonstrate? In-laboratory experiments have shown that young infants are exquisitely sensitive to fine-grained statistical regularities of their speech input. This has led researchers to propose statistical learning as the cornerstone mechanism of early language acquisition. While the statistical learning account has been influential, the extent to which it can explain early language acquisition is still controversial. Recent computational studies provide evidence in favour of the statistical learning hypothesis for sound learning, but can this result be extended to higher level linguistic categories? Here, we introduce STELA, a developmental and psycholinguistic-inspired computational model that simulates how infants might learn at multiple linguistic levels simultaneously based on statistical analysis of raw audio signals. Our algorithm uses only the raw input without any human annotation, and it is trained to predict future segments of speech based on past ones. It reproduces the pattern of parallel learning across sound and word levels reported in infants: it learns to discriminate sounds, recognizes the auditory form of words, and organizes sounds and words along linguistic dimensions. This suggests that statistical learning from raw speech is sufficient to bootstrap early language acquisition at the sound and word levels.

show abstract

Speaker Detection in the Wild: Lessons Learned from JSALT 2019

García¹,

Villalba²,

Bredin³

et al. 2020

View full text Add to dashboard Cite

Vocal Markers from Sustained Phonation in Huntington’s Disease

Riad¹,

Titeux²,

Montillot³

et al. 2020

View full text Add to dashboard Cite

Disease-modifying treatments are currently assessed in neurodegenerative diseases. Huntington's Disease represents a unique opportunity to design automatic sub-clinical markers, even in premanifest gene carriers. We investigated phonatory impairments as potential clinical markers and propose them for both diagnosis and gene carriers follow-up. We used two sets of features: Phonatory features and Modulation Power Spectrum Features. We found that phonation is not sufficient for the identification of sub-clinical disorders of premanifest gene carriers. According to our regression results, Phonatory features are suitable for the predictions of clinical performance in Huntington's Disease.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hadrien Titeux

Pyannote.Audio: Neural Building Blocks for Speaker Diarization

Phonemizer: Text to Phones Transcription for Multiple Languages in Python

Can statistical learning bootstrap early language acquisition? A modeling investigation

Speaker Detection in the Wild: Lessons Learned from JSALT 2019

Vocal Markers from Sustained Phonation in Huntington’s Disease

Contact Info

Product

Resources

About