2021
DOI: 10.32604/iasc.2021.016457
|View full text |Cite
|
Sign up to set email alerts
|

Oral English Speech Recognition Based on Enhanced Temporal Convolutional Network

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
3
2

Relationship

0
8

Authors

Journals

citations
Cited by 9 publications
(6 citation statements)
references
References 21 publications
0
3
0
Order By: Relevance
“…, s n , s p } first. Then the duration predictor outputs the window length [2, 1, 4, 3] and window position [1, 2.5, 5, 8.5], so the added window length is [4,3,6,5], and four speech segments can be obtained. The first segment is {s p , s 1 , s 2 , s 3 }, and the second segment is {s 2 , s 3 , s 4 }, and so on.…”
Section: Audio Spectrum Extractmentioning
confidence: 99%
See 1 more Smart Citation
“…, s n , s p } first. Then the duration predictor outputs the window length [2, 1, 4, 3] and window position [1, 2.5, 5, 8.5], so the added window length is [4,3,6,5], and four speech segments can be obtained. The first segment is {s p , s 1 , s 2 , s 3 }, and the second segment is {s 2 , s 3 , s 4 }, and so on.…”
Section: Audio Spectrum Extractmentioning
confidence: 99%
“…There are two main ways in previous research to get prosodic features from speech. The two ways are frame-by-frame encoding [5,6] and speech-to-vector encoding [7,8]. Frame-by-frame encoding encodes each speech frame to a vector, and it can get more detailed features from speech.…”
Section: Introductionmentioning
confidence: 99%
“…The parameters of the neural network part are updated layer-by-layer and frame-by-frame according to the back-propagation algorithm. When CTC decodes the output, the output sequence must be optimized to obtain the final label sequence [12]. This study adopted the Best path decoding algorithm, assuming that the probability maximum path π and the probability maximum label l * have a one-to-one correspondence, meaning that the many-to-one mapping B is degenerated into a one-to-one mapping relationship and each frame is accepted by the algorithm [13].…”
Section: The Connectionist Temporal Classification-convolutional Neur...mentioning
confidence: 99%
“…The calculation of the forward-backward algorithm is as follows: For the input sequence x and the label sequence l with the time sequence length T, the extended label sequence is l , and the length of the extended label sequence is |l | = 2|l| + 1, defining the first t. The forward probability of outputting the extended label at the sth position at the moment is α(t,s), and the posterior probability calculation formula of the label sequence is shown in Eq. ( 5) [12].…”
Section: Core Idea Of Ctcmentioning
confidence: 99%
“…The loss function of CTC is defined as the negative log probability of the label sequence on the training set S. Then, the loss function (x) output of each sample is given by Eq. (12).…”
Section: Core Idea Of Ctcmentioning
confidence: 99%