Unsupervised speech transcription and alignment based on two complementary ASR systems

Koctur, Tomas; Viszlay, Peter; Staš, Ján; Lojka, Martin; Juhar, Jozef

doi:10.1109/radioelek.2016.7477435

Cited by 4 publications

(3 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…for initial experiments with unsupervised transcription, annotation and acquisition of large speech databases we have created a new speech recognition system architecture based on the complementarity of two Slovak LVCSR systems (see fig. 1, LVCSR 1 and LVCSR 2) [1], [17]. The Slovak LVCSR system uses an open-source recognition engine Julius [18] that was modified to support multi-threaded parallel speech recognition and sharing acoustic and language models among all instances for memory space saving purposes.…”

Section: Automatic Speech Transcriptionmentioning

confidence: 99%

“…The first acoustic model (AM 1) was trained on 320 hours of manually annotated speech recordings of judicial readings and parliament proceedings [20]. The second model (AM 2) was trained on a database of 330 hours of manually annotated speech recordings acquired from the main broadcast news [21] and Court TV shows with a high degree of spontaneity [1], [21]. Both acoustic models (AM 1 and AM 2) were generated from feature vectors with standard dimension of 39 mel-frequency cepstral coefficients, along with delta and acceleration coefficients and cepstral mean normalization enabled.…”

Section: Automatic Speech Transcriptionmentioning

confidence: 99%

“…Typical manual transcription speeds of spontaneous or conversational speech lasts around 7 to 12 times real-time, due to its complexity. The transcription and annotation of non-native speech is an even more difficult, slow and laborious process [1].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

TEDxSK and JumpSK: A New Slovak Speech Recognition Dedicated Corpus

Staš

Hládek

Viszlay

et al. 2017

Journal of Linguistics/Jazykovedný Casopis

Self Cite

View full text Add to dashboard Cite

This paper describes a new Slovak speech recognition dedicated corpus built from TEDx talks and Jump Slovakia lectures. The proposed speech database consists of 220 talks and lectures in total duration of about 58 hours. Annotated speech database was generated automatically in an unsupervised manner by using acoustic speech segmentation based on principal component analysis and automatic speech transcription using two complementary speech recognition systems. The evaluation data consisting of 50 manually annotated talks and lectures in total duration of about 12 hours, has been created for evaluation of the quality of Slovak speech recognition. By unsupervised automatic annotation of TEDx talks and Jump Slovakia lectures we have obtained 21.26% of new speech segments with approximately 9.44% word error rate, suitable for retraining or adaptation of acoustic models trained beforehand.

show abstract

Section: Automatic Speech Transcriptionmentioning

confidence: 99%