The automatic speaker diarization consists in splitting the signal into homogeneous segments and clustering them by speakers. However the speaker segments are specified with anonymous labels. This paper proposed a solution to identify those speakers by extracting their full names pronounced in the show. With a semantic classification tree automatically built on a training corpus, the full names detected in transcription of a segment are associated to this segment or to one of its neighbors. Then, a merging method allows to associate a full name to a speaker cluster instead of a anonymous label provided by the diarization. The experiments are carried out over French broadcast news records from the ESTER 2005 evaluation campaign. About 70% show duration is correctly processed for both development and evaluation corpora. On the evaluation corpus, 18.15% show duration is wrongly named and no decision is taken for 11.91% show duration.
Within the framework of the Carcinologic Speech Severity Index (C2SI) INCaProject, we collected a large database of French speech recordings aiming at validatingDisorder Severity Indexes. Such a database will be useful for measuring the impact of oraland pharyngeal cavity cancer on speech production. It will permit to assess patients Qualityof Life after treatment. The database is composed of audio recordings from 134 sessions andassociated metadata. Several intelligibility and comprehensibility levels of speech functionshave been evaluated. Acoustics and prosody have been assessed. Perceptual evaluation ratesfrom both naive and expert juries are being produced. Automatic analyzes are being carriedout. It is intended to provide speech therapists and physicians with objective tools, whichtake into account the intelligibility and comprehensibility of patients which received cancertreatment (surgery and/or radiotherapy and/or chemotherapy). The aim of this paper is tojustify the necessity of such a corpus and to present its data collection. This C2SI corpus willbe available to the scientific community through the Scientific Interest Group Parolotheque.
In the context of pathological speech, perceptual evaluation is still the most widely used method for intelligibility estimation. Despite being considered a staple in clinical settings, it has a well-known subjectivity associated with it, which results in greater variances and low reproducibility. On the other hand, due to the increasing computing power and latest research, automatic evaluation has become a growing alternative to perceptual assessments. In this paper we investigate an automatic prediction of speech intelligibility using the x-vector paradigm, in the context of head and neck cancer. Experimental evaluation of the proposed model suggests a high correlation rate when applied to our corpus of HNC patients (p = 0.85). Our approach also displayed the possibility of achieving very high correlation values (p = 0.95) when adapting the evaluation to each individual speaker, displaying a significantly more accurate prediction whilst using smaller amounts of data. These results can also provide valuable insight to the redevelopment of test protocols, which typically tend to be substantial and effort-intensive for patients.
Background: Intelligibility and comprehensibility in speech disorders can be assessed both perceptually and instrumentally, but a lack of consensus exists regarding the terminology and related speech measures in both the clinical and scientific fields. Aims:To draw up a more consensual definition of intelligibility and comprehensibility and to define which assessment methods relate to both concepts, as part of their definition. Methods & Procedures: A three-round modified Delphi consensus study was carried out among clinicians, researchers and lecturers engaged in activities in speech disorders.Outcomes & Results: Forty international experts from different fields (mainly clinicians, linguists and computer scientists) participated in the elaboration of a comprehensive definition of intelligibility and comprehensibility and their assessment. While both concepts are linked and contribute to functional human communication, they relate to two different reconstruction levels of the transmitted speech material. Intelligibility refers to the acoustic-phonetic decoding of the utterance, while comprehensibility relates to the reconstruction of the meaning of the message. Consequently, the perceptual assessment of intelligibility requires the use of unpredictable speech material (pseudo-words, minimal word pairs, unpredictable sentences), whereas comprehensibility assessment is meaning and context related and entails more functional speech stimuli and tasks. Conclusion & Implications:This consensus study provides the scientific and clinical communities with a better understanding of intelligibility and comprehensibility. A comprehensive definition was drafted, including specifications regarding the tasks that best fit their assessment. The outcome has implications for both clinical practice and scientific research, as the disambiguation improves communication between professionals and thereby increases the efficiency of patient assessment and care and benefits the progress of research as well as research translation.
International audienceIn this article, we report on the use of an automatic technique to assess pronunciation in the context of several types of speech disorders. Even if such tools already exist, they are more widely used in a different context, namely, Computer-Assisted Language Learning, in which the objective is to assess nonnative pronunciation by detecting learners' mispronunciations at segmental and/or suprasegmental levels. In our work, we sought to determine if the Goodness of Pronunciation (GOP) algorithm, which aims to detect phone-level mispronunciations by means of automatic speech recognition, could also detect segmental deviances in disordered speech. Our main experiment is an analysis of speech from people with unilateral facial palsy. This pathology may impact the realization of certain phonemes such as bilabial plosives and sibilants. Speech read by 32 speakers at four different clinical severity grades was automatically aligned and GOP scores were computed for each phone realization. The highest scores, which indicate large dissimilarities with standard phone realizations, were obtained for the most severely impaired speakers. The corresponding speech subset was manually transcribed at phone level; 8.3% of the phones differed from standard pronunciations extracted from our lexicon. The GOP technique allowed the detection of 70.2% of mispronunciations with an equal rate of about 30% of false rejections and false acceptances. Finally, to broaden the scope of the study, we explored the correlation between GOP values and speech comprehensibility scores on a second corpus, composed of sentences recorded by six people with speech impairments due to cancer surgery or neurological disorders. Strong correlations were achieved between GOP scores and subjective comprehensibility scores (about 0.7 absolute). Results from both experiments tend to validate the use of GOP to measure speech capability loss, a dimension that could be used as a complement to physiological measures in pathologies causing speech disorders
Inspiré par les pratiques d'établissements prestigieux (notamment au MIT), le projet pédagogique ici présenté vise à fournir un support attractif tel que les robots pour l'enseignement de l'intelligence artificielle au travers de projets interdisciplinaires et à améliorer la réussite des étudiants en Licence et en Master. En effet, en impliquant les étudiants dans des projets leur permettant de découvrir les différentes thématiques enseignées en Master, nous souhaitons intéresser les étudiants de licence et les accompagner dans la construction de leur projet de formation afin qu'il leur corresponde au mieux. L'étude comparative présentée ici a permis d'étudier les effets de l'utilisation de robots en pédagogie de projet sur la motivation et l'investissement des étudiants.
International audienceThe combination of Automatic Speech Recognition (ASR) systems generally relies on a posteriori merge of system outputs or on a cross-adaptation. In this paper, we propose an integrated approach where the search of a primary system is driven by the outputs of a secondary one. This method allows to drive the primary system search by using the one-best hypotheses and the word posteriors gathered from the secondary system. Experiments are carried out within the experimental framework of the ESTER evaluation campaign [1]. Results show that the driven decoding algorithm significantly outper-forms the two single ASR systems (-8% of relative WER,-1.7% absolute). Finally, we investigate the interactions between driven decoding and cross-adaptations. The best cross-adaptation strategy in combination with the driven decoding process brings to a final absolute gain of about 1.9% WER
OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. ABSTRACTIn this paper, we present a complete system for audio indexing. This system is based state-of-the-art methods of SpeechMusic-Noise segmentation and Monophonic/Polyphonic estimation. After those methods we propose an original system of superposed sources detection. This approach is based on the analysis of the evolution of the predominant frequencies.In order to validate the whole system we used different corpora : Radio broadcasts, studio music and degraded field records. The first results are encouraging and show the potential of our approach which is generic and can be used on both music and speech contents.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.