2017
DOI: 10.1515/jazcas-2017-0044
|View full text |Cite
|
Sign up to set email alerts
|

TEDxSK and JumpSK: A New Slovak Speech Recognition Dedicated Corpus

Abstract: This paper describes a new Slovak speech recognition dedicated corpus built from TEDx talks and Jump Slovakia lectures. The proposed speech database consists of 220 talks and lectures in total duration of about 58 hours. Annotated speech database was generated automatically in an unsupervised manner by using acoustic speech segmentation based on principal component analysis and automatic speech transcription using two complementary speech recognition systems. The evaluation data consisting of 50 manually annot… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2017
2017
2019
2019

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(1 citation statement)
references
References 20 publications
0
1
0
Order By: Relevance
“…Today, the internet has various resource types, for example, social media, blogs, twitter, and new portals, which offer a lot of speech data and which can be freely downloaded. Moreover, it has been proved that the corpora created on internet resources yielded promising results [8] [9]. Therefore, speech data was collected first from the web news.…”
Section: Collecting the Data From The Web Newsmentioning
confidence: 99%
“…Today, the internet has various resource types, for example, social media, blogs, twitter, and new portals, which offer a lot of speech data and which can be freely downloaded. Moreover, it has been proved that the corpora created on internet resources yielded promising results [8] [9]. Therefore, speech data was collected first from the web news.…”
Section: Collecting the Data From The Web Newsmentioning
confidence: 99%