2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference On 2019
DOI: 10.1109/hpcc/smartcity/dss.2019.00309
|View full text |Cite
|
Sign up to set email alerts
|

Word Image Representation Based on Sequence to Sequence Model with Attention Mechanism for Out-of-Vocabulary Keyword Spotting

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 38 publications
0
4
0
Order By: Relevance
“…The sequence-to-sequence architecture has led to state-of-the-art results in Natural Language Processing, involving translating an input sequence to an output sequence of a different length in general. Use of the Seq2Seq architecture has started been used in HTR and KWS recently as well [26,27].…”
Section: Related Workmentioning
confidence: 99%
“…The sequence-to-sequence architecture has led to state-of-the-art results in Natural Language Processing, involving translating an input sequence to an output sequence of a different length in general. Use of the Seq2Seq architecture has started been used in HTR and KWS recently as well [26,27].…”
Section: Related Workmentioning
confidence: 99%
“…With the recent advent of deep learning-based KWS methods, a standard solution for example model architectures is to normalize input images to fixed size [69,104,121,166]. For instance, Wei and co-workers propose a normalization by resizing all input images to a standard size of 310 pixel width and 50 height in [121], whereas in [120], they resize all input images so that they have the same width (either pure or by padding white pixels) and aspect ratio. Wicht et al [150,151] normalize the word images to remove the skew and slant of the text using [228].…”
Section: Normalizationmentioning
confidence: 99%
“…The middle zone is modeled using HMM whereas the upper/lower zones are used to train similar feed-forward networks that include convolutional layers in their architecture have been used. These networks work typically either by producing in their output a suitable descriptor of the input word image [4,100,101,159,173,174], or by using network layer activations to create input word image descriptors [32,34,37,74,120,153,180,235]. Again, a typical distance that is used is the Euclidean.…”
Section: Word To Word Matchingmentioning
confidence: 99%
See 1 more Smart Citation