2022
DOI: 10.15587/1729-4061.2022.252801
|View full text |Cite
|
Sign up to set email alerts
|

Identifying the influence of transfer learning method in developing an end-to-end automatic speech recognition system with a low data level

Abstract: Ensuring the best quality and performance of modern speech technologies, today, is possible based on the widespread use of machine learning methods. The idea of this project is to study and implement an end-to-end system of automatic speech recognition using machine learning methods, as well as to develop new mathematical models and algorithms for solving the problem of automatic speech recognition for agglutinative (Turkic) languages. Many research papers have shown that deep learning methods make it easier t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 14 publications
(9 citation statements)
references
References 28 publications
(34 reference statements)
0
4
0
Order By: Relevance
“…Suppose that for image classification, a CNN with architecture S and parameters P is presynthesized, CNN={S, P} [14,15]. This network has already learned earlier to extract features for solving the problem of image classification.…”
Section: Formulation Of the Problemmentioning
confidence: 99%
“…Suppose that for image classification, a CNN with architecture S and parameters P is presynthesized, CNN={S, P} [14,15]. This network has already learned earlier to extract features for solving the problem of image classification.…”
Section: Formulation Of the Problemmentioning
confidence: 99%
“…Besides it has been proposed to replace UBM and i-vector classifier by deep neural network (DNN) taking into account the experience of deep learning for speech recognition [9,10]. The DNNbased d-vector framework assigns the ground-truth speaker identity of a training speech signal as the labels of the training frames of this signal.…”
Section: Analysis Of Recent Research and Publicationsmentioning
confidence: 99%
“…Raw wave neural networks take raw waves in the time domain as the inputs to extract learnable acoustic features [10,16]. It has been observed that with the use of CNN the filters of the first convolution layer capture the speaker information in low frequency regions.…”
Section: Analysis Of Recent Research and Publicationsmentioning
confidence: 99%
“…Kazakh language is a low-resource language from the Turkic family of languages, which belong to agglutinative languages. Almost all of these languages suffer from data shortages 1 . Namely, they suffer from a lack of transcribed audio data.…”
Section: Introductionmentioning
confidence: 99%