Automatic phonetic transcription of large speech corpora

Bael, Christophe Van; Boves, L.W.J.; Heuvel, H. van den; Strik, Helmer

doi:10.1016/j.csl.2007.03.003

Cited by 19 publications

(22 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…We used a lexicon with many pronunciation variants for each word, which we generated by means of rules applied to the canonical pronunications. Contrary to Van Bael et al (2007b) and Cucchiarini and Binnenpoorte (2002), whose rules were insensitive to the stress pattern and syllable structure of the word, our rules are sensitive to this information. As a result, we obtained a larger number of probable variants.…”

Section: Introductionmentioning

confidence: 72%

“…The models were trained at a frame shift of 5 ms and a window length of 25 ms, where for each frame 13 MFCCs (i.e., the mel-scaled cepstral coefficients C0-C12) and their first and second order derivatives (39 features) were calculated. We used a shorter frame shift than the default of 10 ms used in earlier studies of segmental reductions (e.g., Van Bael et al, 2007b;Adda-Decker et al, 2005;Schuppler et al, 2009) tin order to achieve more accurate positions of the segment boundaries and in order to be able to identify very short segments. With a frame shift of 5ms and acoustic models consisting of three emitting states (no skips), segments will be assigned a minimum length of 15ms.…”

Section: Corpus Datamentioning

confidence: 99%

“…Only the columns 'Type' and 'Order' are relevant for this section, the other columns will be discussed in Section 4. Some of the rules are well-studied for Dutch and have been used before in the automatic generation of phonemic transcriptions (Van Bael et al, 2007b;Kessens et al, 2003); these are: 'schwa-insertion' (1.0), '[n]-deletion after schwa' (1.1), 'regressive assimilation of voice for obstruents before voiced plosives' (1.2), 'devoicing of plosives following voiceless plosives' (1.3), 'devoicing of fricatives in all word-positions' (1.4), '[t]-deletion in word-final position, preceded by consonant' (4.8) and '[r]-deletion after schwa' (4.5). The other rules were formulated on the basis of the research by Ernestus (2000) on voice assimilation and segment reduction in casual Dutch.…”

Section: Generation Of Pronunciation Variantsmentioning

confidence: 99%

“…In general, pronunciation variants can either be extracted from a large corpus that has already been transcribed manually at the segmental level (data-driven approach, e.g., Hämäläinen et al, 2007;Kessens et al, 2003) or they can be generated by applying a set of rules, proposed in the phonological/phonetic literature, to the canonical forms in the lexicon (knowledge-based approach, e.g., Van Bael et al, 2007b). The set of variants derived with a data-driven approach depends on the corpus from which the variants are extracted and tends to contain fewer pronunciation variants for most words than a lexicon created with the knowledgebased approach.…”

Section: Generation Of Pronunciation Variantsmentioning

confidence: 99%

“…A more recently available method is to create phonetic transcriptions by using an automatic speech recognition (ASR) system to determine the most likely pronunciation variant for each word in a spoken corpus (e.g., Binnenpoorte, 2006;Cucchiarini and Binnenpoorte, 2002;Van Bael et al, 2007b). With this method large amounts of speech material can be transcribed in a relatively short period of time.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Acoustic reduction in conversational Dutch: A quantitative analysis based on automatically generated segmental transcriptions

Schuppler

Ernestus

Scharenborg

et al. 2011

Journal of Phonetics

Self Cite

View full text Add to dashboard Cite

In spontaneous, conversational speech, words are often reduced compared to their citation forms, such that a word like yesterday may sound like ['jESeI]. The present paper investigates such acoustic reduction . The study of reduction needs large corpora that are transcribed phonetically. The first part of this paper describes an automatic transcription procedure used to obtain such a large phonetically transcribed corpus of Dutch spontaneous dialogues, which is subsequently used for the investigation of acoustic reduction. First, the orthographic transcription were adapted for automatic processing. Next, the phonetic transcription of the corpus was created by means of a forced alignment with a lexicon with multiple pronunciation variants per word. These variants were generated by applying phonological and reduction rules to the canonical phonetic transcriptions of the words. The second part of this paper reports the results of a quantitative analysis of reduction in the corpus on the basis of the generated transcriptions and gives an inventory of segmental reductions in standard Dutch. Overall, we found that reduction is more pervasive in spontaneous Dutch than previously documented.

show abstract

Section: Introductionmentioning

confidence: 72%

Section: Corpus Datamentioning

confidence: 99%

Section: Generation Of Pronunciation Variantsmentioning

confidence: 99%

Section: Generation Of Pronunciation Variantsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Acoustic reduction in conversational Dutch: A quantitative analysis based on automatically generated segmental transcriptions

Schuppler

Ernestus

Scharenborg

et al. 2011

Journal of Phonetics

Self Cite

View full text Add to dashboard Cite

show abstract

Corpora and Exemplars in Phonology

Ernestus

Baayen²

2011

The Handbook of Phonological Theory

View full text Add to dashboard Cite

Development of a Large Spontaneous Speech Database of Agglutinative Hungarian Language

Neuberger

Gyarmathy

Gráczi

et al. 2014

Text, Speech and Dialogue

View full text Add to dashboard Cite

In this paper, a large Hungarian spoken language database is introduced. This phonetically-based multi-purpose database contains various types of spontaneous and read speech from 333 monolingual speakers (about 50 minutes of speech sample per speaker). This study presents the background and motivation of the development of the BEA Hungarian database, describes its protocol and the transcription procedure, and also presents existing and proposed research using this database. Due to its recording protocol and the transcription it provides a challenging material for various comparisons of segmental structures of speech also across languages.

show abstract

Automatic phonetic transcription of large speech corpora

Cited by 19 publications

References 19 publications

Acoustic reduction in conversational Dutch: A quantitative analysis based on automatically generated segmental transcriptions

Acoustic reduction in conversational Dutch: A quantitative analysis based on automatically generated segmental transcriptions

Corpora and Exemplars in Phonology

Development of a Large Spontaneous Speech Database of Agglutinative Hungarian Language

Contact Info

Product

Resources

About