2015
DOI: 10.1186/s13636-015-0068-3
|View full text |Cite
|
Sign up to set email alerts
|

Phone recognition with hierarchical convolutional deep maxout networks

Abstract: Deep convolutional neural networks (CNNs) have recently been shown to outperform fully connected deep neural networks (DNNs) both on low-resource and on large-scale speech tasks. Experiments indicate that convolutional networks can attain a 10-15 % relative improvement in the word error rate of large vocabulary recognition tasks over fully connected deep networks. Here, we explore some refinements to CNNs that have not been pursued by other authors. First, the CNN papers published up till now used sigmoid or r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
43
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 75 publications
(45 citation statements)
references
References 37 publications
0
43
0
Order By: Relevance
“…The results we obtain, although comparable with the stateof-the-art in semi-supervised learning, are not comparable with the current state-of-the-art in phone recognition on the TIMIT database which is 16.5% Phone Error Rate (or, equivalently, 83.5% accuracy) [38]. The reason for this is twofold: Firstly, the above results only use a fraction of the labels provided by the TIMIT database for training.…”
Section: Discussionmentioning
confidence: 66%
See 1 more Smart Citation
“…The results we obtain, although comparable with the stateof-the-art in semi-supervised learning, are not comparable with the current state-of-the-art in phone recognition on the TIMIT database which is 16.5% Phone Error Rate (or, equivalently, 83.5% accuracy) [38]. The reason for this is twofold: Firstly, the above results only use a fraction of the labels provided by the TIMIT database for training.…”
Section: Discussionmentioning
confidence: 66%
“…To our knowledge, the best performing method is based on hierarchical convolutional deep maxout networks and achieves 16.5% Phone Error Rate (or, equivalently, 83.5% accuracy) [38].…”
Section: Resultsmentioning
confidence: 99%
“…Decoding and evaluation was performed by applying a modified version of HTK [27]. We employed our custom neural network implementation, which achieved outstanding results earlier on several datasets (eg [29,30]). Following preliminary tests, we opted for five hidden layers, each one containing 1000 rectified neurons, and we applied the softmax activation function in the output layer.…”
Section: Methodsmentioning
confidence: 99%
“…Lately, ASR systems have become much more accurate and robust thanks to deep neural networks (DNNs) [16][17][18]. We used scripts provided with the Kaldi toolkit [19] for training DNN-based ASR systems and the IRSTLM tool [20] for building language models.…”
Section: Asr System Development Detailsmentioning
confidence: 99%