2007
DOI: 10.1016/j.csl.2007.03.003
|View full text |Cite
|
Sign up to set email alerts
|

Automatic phonetic transcription of large speech corpora

Abstract: Most large speech corpora are delivered with a lexicon that contains a canonical transcription of every word in the orthographic transcription. Such a lexicon can be used for generating a hypothetical 'canonical' phonetic transcription from the orthography. In addition, time and money permitting, some speech corpora are provided with a manually verified broad phonetic transcription of at least part of the material. Since the manual verification of phonetic transcriptions is time-consuming and expensive, we inv… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
21
0
1

Year Published

2011
2011
2017
2017

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 19 publications
(22 citation statements)
references
References 19 publications
0
21
0
1
Order By: Relevance
“…We used a lexicon with many pronunciation variants for each word, which we generated by means of rules applied to the canonical pronunications. Contrary to Van Bael et al (2007b) and Cucchiarini and Binnenpoorte (2002), whose rules were insensitive to the stress pattern and syllable structure of the word, our rules are sensitive to this information. As a result, we obtained a larger number of probable variants.…”
Section: Introductionmentioning
confidence: 72%
See 4 more Smart Citations
“…We used a lexicon with many pronunciation variants for each word, which we generated by means of rules applied to the canonical pronunications. Contrary to Van Bael et al (2007b) and Cucchiarini and Binnenpoorte (2002), whose rules were insensitive to the stress pattern and syllable structure of the word, our rules are sensitive to this information. As a result, we obtained a larger number of probable variants.…”
Section: Introductionmentioning
confidence: 72%
“…The models were trained at a frame shift of 5 ms and a window length of 25 ms, where for each frame 13 MFCCs (i.e., the mel-scaled cepstral coefficients C0-C12) and their first and second order derivatives (39 features) were calculated. We used a shorter frame shift than the default of 10 ms used in earlier studies of segmental reductions (e.g., Van Bael et al, 2007b;Adda-Decker et al, 2005;Schuppler et al, 2009) tin order to achieve more accurate positions of the segment boundaries and in order to be able to identify very short segments. With a frame shift of 5ms and acoustic models consisting of three emitting states (no skips), segments will be assigned a minimum length of 15ms.…”
Section: Corpus Datamentioning
confidence: 99%
See 3 more Smart Citations