2017
DOI: 10.48550/arxiv.1701.02720
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
37
0

Year Published

2018
2018
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 45 publications
(41 citation statements)
references
References 16 publications
0
37
0
Order By: Relevance
“…During the last decade, deep neural networks (DNN) have encountered a wide success in automatic speech recognition. Many architectures such as recurrent (RNN) [34,15,1,31,13], time-delay (TDNN) [39,28], or convolutional neural networks (CNN) [42] have been proposed and achieved better performances than traditional hidden Markov models (HMM) combined with gaussian mixtures models (GMM) in different speech recognition tasks. However, despite such evolution of models and paradigms, the acoustic feature representation remains almost the same.…”
Section: Introductionmentioning
confidence: 99%
“…During the last decade, deep neural networks (DNN) have encountered a wide success in automatic speech recognition. Many architectures such as recurrent (RNN) [34,15,1,31,13], time-delay (TDNN) [39,28], or convolutional neural networks (CNN) [42] have been proposed and achieved better performances than traditional hidden Markov models (HMM) combined with gaussian mixtures models (GMM) in different speech recognition tasks. However, despite such evolution of models and paradigms, the acoustic feature representation remains almost the same.…”
Section: Introductionmentioning
confidence: 99%
“…We designed the XNE around a lean hardware engine focused on the execution of the feature loops of Listing 1. We execute these as hardwired inner loops, operating in principle on a fixed-sized input tiles in a fixed number of cycles 2 . A design-time throughput parameter (TP) is used to define the size of each tile, which is also the number of simultaneous XNOR operations the datapath can execute per cycle; every TP cycles, the accelerator consumes one set of TP input binary pixels and TP sets of TP binary weights to produce one set of TP output pixels.…”
Section: Accelerator Architecturementioning
confidence: 99%
“…Once an output feature vector has been produced by the XNE datapath, it is completely computed and never used again. With the microcoding strategy proposed in Listing 3, a single input feature vector has to be reloaded fs 2 times, and afterwards it is completely consumed.…”
Section: Microcode Processormentioning
confidence: 99%
See 2 more Smart Citations