2016 26th International Conference Radioelektronika (RADIOELEKTRONIKA) 2016
DOI: 10.1109/radioelek.2016.7477435
|View full text |Cite
|
Sign up to set email alerts
|

Unsupervised speech transcription and alignment based on two complementary ASR systems

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2016
2016
2017
2017

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 6 publications
0
3
0
Order By: Relevance
“…for initial experiments with unsupervised transcription, annotation and acquisition of large speech databases we have created a new speech recognition system architecture based on the complementarity of two Slovak LVCSR systems (see fig. 1, LVCSR 1 and LVCSR 2) [1], [17]. The Slovak LVCSR system uses an open-source recognition engine Julius [18] that was modified to support multi-threaded parallel speech recognition and sharing acoustic and language models among all instances for memory space saving purposes.…”
Section: Automatic Speech Transcriptionmentioning
confidence: 99%
See 2 more Smart Citations
“…for initial experiments with unsupervised transcription, annotation and acquisition of large speech databases we have created a new speech recognition system architecture based on the complementarity of two Slovak LVCSR systems (see fig. 1, LVCSR 1 and LVCSR 2) [1], [17]. The Slovak LVCSR system uses an open-source recognition engine Julius [18] that was modified to support multi-threaded parallel speech recognition and sharing acoustic and language models among all instances for memory space saving purposes.…”
Section: Automatic Speech Transcriptionmentioning
confidence: 99%
“…The first acoustic model (AM 1) was trained on 320 hours of manually annotated speech recordings of judicial readings and parliament proceedings [20]. The second model (AM 2) was trained on a database of 330 hours of manually annotated speech recordings acquired from the main broadcast news [21] and Court TV shows with a high degree of spontaneity [1], [21]. Both acoustic models (AM 1 and AM 2) were generated from feature vectors with standard dimension of 39 mel-frequency cepstral coefficients, along with delta and acceleration coefficients and cepstral mean normalization enabled.…”
Section: Automatic Speech Transcriptionmentioning
confidence: 99%
See 1 more Smart Citation