In this paper the phonetic and speech corpus DIMEx100 for Mexican Spanish is presented. We discuss both the linguistic motivation and the computational tools employed for the design, collection and transcription of the corpus. The phonetic transcription methodology is based on recent empirical studies proposing a new basic set of allophones and phonological rules for the dialect of the central part of Mexico. These phonological rules have been implemented in a visualization tool that provides the expected phonetic representation of a text, and also a default temporal alignment between the spoken corpus and its phonetic representation. The tools are also used to compute the properties of the corpus and compare these figures with previous work. 1 Among the few available corpus we can mention the Latino-40 Spanish Read News for Latin-American Spanish, available LDC (http://www.ldc.upenn.edu), the ALBAYZIN Corpus for Castillan Spanish and the Spanish Speech Corpus 1, these later two available at ELDA (http://www.elda.fr).
In this paper the transcription and evaluation of the corpus DIMEx100 for Mexican Spanish is presented. First we describe the corpus and explain the linguistic and computational motivation for its design and collection process; then, the phonetic antecedents and the alphabet adopted for the transcription task are presented; the corpus has been transcribed at three different granularity levels, which are also specified in detail. The corpus statistics for each transcription level are also presented. A set of phonetic rules describing phonetic context observed empirically in spontaneous conversation is also validated with the transcription. The corpus has been used for the construction of acoustic models and a phonetic dictionary for the construction of a speech recognition system. Initial performance results suggest that the data can be used to train good quality acoustic models.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.