We introduce pyannote.audio, an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines. pyannote.audio also comes with pre-trained models covering a wide range of domains for voice activity detection, speaker change detection, overlapped speech detection, and speaker embedding -reaching state-of-the-art performance for most of them.
Before they even produce their first word, infants start developing a language-specific perception, recognize the auditory form of frequent words, and develop a rudimentary knowledge of grammatical categories. A major question in language development is: what mechanisms are responsible for the effortless learning infants demonstrate? In-laboratory experiments have shown that young infants are exquisitely sensitive to fine-grained statistical regularities of their speech input. This has led researchers to propose statistical learning as the cornerstone mechanism of early language acquisition. While the statistical learning account has been influential, the extent to which it can explain early language acquisition is still controversial. Recent computational studies provide evidence in favour of the statistical learning hypothesis for sound learning, but can this result be extended to higher level linguistic categories? Here, we introduce STELA, a developmental and psycholinguistic-inspired computational model that simulates how infants might learn at multiple linguistic levels simultaneously based on statistical analysis of raw audio signals. Our algorithm uses only the raw input without any human annotation, and it is trained to predict future segments of speech based on past ones. It reproduces the pattern of parallel learning across sound and word levels reported in infants: it learns to discriminate sounds, recognizes the auditory form of words, and organizes sounds and words along linguistic dimensions. This suggests that statistical learning from raw speech is sufficient to bootstrap early language acquisition at the sound and word levels.
Disease-modifying treatments are currently assessed in neurodegenerative diseases. Huntington's Disease represents a unique opportunity to design automatic sub-clinical markers, even in premanifest gene carriers. We investigated phonatory impairments as potential clinical markers and propose them for both diagnosis and gene carriers follow-up. We used two sets of features: Phonatory features and Modulation Power Spectrum Features. We found that phonation is not sufficient for the identification of sub-clinical disorders of premanifest gene carriers. According to our regression results, Phonatory features are suitable for the predictions of clinical performance in Huntington's Disease.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.