Solène Evain scite author profile

Human beatboxing is a vocal art making use of speech organs to produce percussive sounds and imitate musical instruments. Beatbox sound classification is a current challenge that can be used for automatic database annotation and music-information retrieval. In this study, a human-beatbox sound recognition system was developed with an adaptation of the Kaldi toolbox. Such tool is already widely used for automatic speech recognition. The corpus consisted of eighty boxemes, which were recorded repeatedly by two beatboxers. The sounds were annotated and transcribed to the system by means of a beatbox-specific pictographic writing (Vocal Grammatics). The recognition-system robustness to recording conditions was assessed on recordings of six different microphones and settings. The decoding part was made with monophone acoustic models trained with a classical HMM-GMM model. Different parameters of our system were tested : i) the number of HMM states, ii) the number of MFCC, iii) the presence or not of a pause boxeme in right and left contexts in the lexicon and iv) the rate of silence probability. Our best model was obtained with the addition of a pause in left and right contexts of each boxeme in the lexicon, a 0.8 silence probability, 22 MFCC and three states HMM. Boxeme error rate in such configuration was lowered to 15.13%.

show abstract

Towards Automatic Captioning of University Lectures for French students who are Deaf

Evain

Lecouteux

Portet

et al. 2020

View full text Add to dashboard Cite

Access to higher education of students who are deaf is below the national average. Recently, there has been a growing number of applications for the automatic transcription of speech, which claim to make everyday speech more accessible to people who are Deaf or Hard-of-Hearing. However, these systems require a good command of the written language, and a significant proportion of the deaf public has low literacy skills. Moreover, we have very little data on how these audiences actually deal with captions. In this paper, we describe the MANES project, whose long-term goal is to assess the usefulness of captioning for the accessibility of lectures by students who are deaf. We present the first technical results of a real-time system to make course captioning suitable for the target audience.CCS Concepts: • Human-centered computing → Accessibility technologies.

show abstract

LeBenchmark, un référentiel d'évaluation pour le français oral

Le¹,

Alisamir²,

Dinarelli³

et al. 2022

View full text Add to dashboard Cite

L'apprentissage autosupervisé a apporté des améliorations remarquables dans de nombreux domaines tels que la vision par ordinateur ou le traitement de la langue et de la parole, en exploitant de grandes quantités de données non étiquetées. Dans le contexte spécifique de la parole, cependant, et malgré des résultats prometteurs, il existe un manque évident de normalisation dans les processus d'évaluation permettant des comparaisons précises de ces modèles, en particulier pour les autres langues que l'anglais. Nous présentons ici à la communauté francophone LeBenchmark, un cadre de référence en sources ouvertes et reproductible pour évaluer des modèles autosupervisés à partir de corpus de parole en français. Il est composé de quatre tâches : reconnaissance automatique de la parole, compréhension du langage parlé, traduction automatique de la parole et reconnaissance automatique d'émotions. Nous encourageons la communauté francophone à utiliser ce référentiel dans ses futures expérimentations, notamment pour l'évaluation de modèles autosupervisés.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Solène Evain

LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

Human beatbox sound recognition using an automatic speech recognition toolkit

Towards Automatic Captioning of University Lectures for French students who are Deaf

LeBenchmark, un référentiel d'évaluation pour le français oral

Contact Info

Product

Resources

About