2020
DOI: 10.1007/978-3-030-46140-9_6
|View full text |Cite
|
Sign up to set email alerts
|

Automatic Speech Recognition of Quechua Language Using HMM Toolkit

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
5
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 7 publications
0
5
0
Order By: Relevance
“…( Both corpora were preprocessed to improve the quality of the audios, eliminating excess background noise, audios with no voice, those with background music and those not spoken in Quechua, the latter feature added thanks to the first automatic speech recognition developed for Quechua [19]. Finally, audios longer than 30 seconds were divided into segments of no more than 30 seconds and transformed to mono channel, 16 kHz sampling, 16-bit precision encoding and WAV format.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…( Both corpora were preprocessed to improve the quality of the audios, eliminating excess background noise, audios with no voice, those with background music and those not spoken in Quechua, the latter feature added thanks to the first automatic speech recognition developed for Quechua [19]. Finally, audios longer than 30 seconds were divided into segments of no more than 30 seconds and transformed to mono channel, 16 kHz sampling, 16-bit precision encoding and WAV format.…”
Section: Methodsmentioning
confidence: 99%
“…This pilot system was developed using the K-Nearest Neighbor (KNN) using the Hidden Markov Model Toolkit [18]. [19] developed a new ASR model based on the monophone HMM topology, a model with five states per HMM and no jumps, that achieved a WER of 12.7%. This model was trained using the corpus of 97.5 hours of [17].…”
Section: Automatic Speech Recognitionmentioning
confidence: 99%
“…But for Indigenous and other under-resourced languages, creating ASR entails first collecting a large mass of already-transcribed data -in addition to the data for the sociolinguistic experiment -so that a new ASR model can be trained. This process is still difficult and expensive, requiring (i) linguists and community experts to settle on an orthographic or phonetic representation of the language, (ii) human experts to transcribe hours of recordings (from 4 to 100), and (iii) programmers to train the system on specialised computer servers (Adams et al, 2019;Besacier et al, 2014;Coto-Solano, 2021;Foley et al, 2018;Gupta & Boulianne, 2020;Levow et al, 2021;Matsuura et al, 2020;Partanen et al, 2020;Prud'hommeaux et al, 2021;Zahrer et al, 2020;Zevallos et al, 2020).…”
Section: Automatic Speech Recognition For Sociophoneticsmentioning
confidence: 99%
“…Indigenous and other minority languages usually have few transcribed audio recordings, and so adapting data-hungry ASR algorithms to assist in their documentation is an active area of research (Besacier et al, 2014;Jimerson and Prud'hommeaux, 2018;Michaud et al, 2019;Foley et al, 2018;Gupta and Boulianne, 2020b,a;Zahrer et al, 2020;Thai et al, 2019;Li et al, 2020;Zevallos et al, 2019;Matsuura et al, 2020;Levow et al, 2021). This paper will examine an element that might appear obvious at first, but one where the literature is "inconclusive" (Adams, 2018), and which can have major consequences in performance: How should tones be transcribed when dealing with extremely low-resource languages?…”
Section: Introductionmentioning
confidence: 99%