2021
DOI: 10.1109/taslp.2021.3120632
|View full text |Cite
|
Sign up to set email alerts
|

Keyword Search Using Attention-Based End-to-End ASR and Frame-Synchronous Phoneme Alignments

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 12 publications
(4 citation statements)
references
References 41 publications
0
4
0
Order By: Relevance
“…During the inference stage, we retrieve keywords within ASR 2-best hypotheses. During KWS scoring, a predicted keyword occurrence is considered correct when there is a 50% time overlap at least between the predicted occurrence and a reference occurrence of the same keyword [36]. The results are shown in Table 8.…”
Section: Speaker Diarizationmentioning
confidence: 99%
“…During the inference stage, we retrieve keywords within ASR 2-best hypotheses. During KWS scoring, a predicted keyword occurrence is considered correct when there is a 50% time overlap at least between the predicted occurrence and a reference occurrence of the same keyword [36]. The results are shown in Table 8.…”
Section: Speaker Diarizationmentioning
confidence: 99%
“…Each dot represents a language represented in the corpus. In [18], we improved the alignments and phonemic transcripts of 48 languages in the companion corpus by using the zero-resource acoustic modelling approaches discussed in this dissertation [19][20][21][22]. This allowed for the first time the systematic investigation of phonetic typology across a wide range of languages.…”
Section: Introductionmentioning
confidence: 99%
“…Automatic speech recognition (ASR) has a long history of research (Bahl et al, 1983;Hinton et al, 2012;Chu et al, 2020). By audio signal processing and modeling, speech contents can be transcribed into texts for various applications (Yu and Deng, 2016;Yang et al, 2021). Yet in particular cases, the audio signals cannot be clearly produced or captured.…”
Section: Introductionmentioning
confidence: 99%