Investigation of maxout networks for speech recognition

Swietojanski, Paweł; Li, Jinyu; Huang, Jui-Ting

doi:10.1109/icassp.2014.6855088

Cited by 38 publications

(26 citation statements)

References 25 publications

(14 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We have also tried to train sigmoid [32] network, but the initial loss never decreased. Finally, as proposed by Swietojanski et.al [33], we have tested combination of ReLU for first layers and maxout for the last layers of the network.…”

Section: Methodsmentioning

confidence: 99%

Systematic evaluation of convolution neural network advances on the Imagenet

Mishkin

Sergievskiy

Matas

2017

Computer Vision and Image Understanding

249

144

View full text Add to dashboard Cite

The paper systematically studies the impact of a range of recent advances in CNN architectures and learning methods on the object categorization (ILSVRC) problem. The evalution tests the influence of the following choices of the architecture: non-linearity (ReLU, ELU, maxout, compatability with batch normalization), pooling variants (stochastic, max, average, mixed), network width, classifier design (convolutional, fully-connected, SPP), image pre-processing, and of learning parameters: learning rate, batch size, cleanliness of the data, etc.The performance gains of the proposed modifications are first tested individually and then in combination. The sum of individual gains is bigger than the observed improvement when all modifications are introduced, but the "deficit" is small suggesting independence of their benefits.We show that the use of 128x128 pixel images is sufficient to make qualitative conclusions about optimal network structure that hold for the full size Caffe and VGG nets. The results are obtained an order of magnitude faster than with the standard 224 pixel images.

show abstract

Section: Methodsmentioning

confidence: 99%

Systematic evaluation of convolution neural network advances on the Imagenet

Mishkin

Sergievskiy

Matas

2017

Computer Vision and Image Understanding

249

144

View full text Add to dashboard Cite

show abstract

“…The dropout method was shown to improve the generalization ability of neural networks by preventing the co-adaptation of units [28]. Dropout is now routinely used in the training of DNNs for speech recognition, and some researchers have already reported that it works nicely with maxout units as well [19,29]. We also find it to yield a significant performance gain.…”

Section: Introductionmentioning

confidence: 56%

“…This activation function can be regarded as a generalization of the rectifier function [16], and so far, only a few studies have attempted to apply maxout networks to speech recognition tasks. These all found that maxout nets slightly outperformed ReLU networks, in particular under lowresource conditions [17][18][19]. Here, we show that the pooling procedure applied in CNNs and the pooling step of the maxout function are practically the same, and hence, it is trivial to combine the two techniques and construct convolutional networks out of maxout neurons.…”

Section: Introductionmentioning

confidence: 91%

See 1 more Smart Citation

Phone recognition with hierarchical convolutional deep maxout networks

Tóth

2015

J AUDIO SPEECH MUSIC PROC.

View full text Add to dashboard Cite

Deep convolutional neural networks (CNNs) have recently been shown to outperform fully connected deep neural networks (DNNs) both on low-resource and on large-scale speech tasks. Experiments indicate that convolutional networks can attain a 10-15 % relative improvement in the word error rate of large vocabulary recognition tasks over fully connected deep networks. Here, we explore some refinements to CNNs that have not been pursued by other authors. First, the CNN papers published up till now used sigmoid or rectified linear (ReLU) neurons. We will experiment with the maxout activation function proposed recently, which has been shown to outperform the rectifier activation function in fully connected DNNs. We will show that the pooling operation of CNNs and the maxout function are closely related, and so the two technologies can be readily combined to build convolutional maxout networks. Second, we propose to turn the CNN into a hierarchical model. The origins of this approach go back to the era of shallow nets, where the idea of stacking two networks on each other was relatively well known. We will extend this method by fusing the two networks into one joint deep model with many hidden layers and a special structure. We will show that with the hierarchical modelling approach, we can reduce the error rate of the network on an expanded context of input. In the experiments on the Texas Instruments Massachusetts Institute of Technology (TIMIT) phone recognition task, we find that a CNN built from maxout units yields a relative phone error rate reduction of about 4.3 % over ReLU CNNs. Applying the hierarchical modelling scheme to this CNN results in a further relative phone error rate reduction of 5.5 %. Using dropout training, the lowest error rate we get on TIMIT is 16.5 %, which is currently the best result. Besides experimenting on TIMIT, we also evaluate our best models on a low-resource large vocabulary task, and we find that all the proposed modelling improvements give consistently better results for this larger database as well.

show abstract

“…Maxout units are adopted to improve the performance compared with ReLU except the first layer as suggested in [21], the bottom layers should be replaced by layers of a smaller number of ReLU units. The configuration is described as follows:…”

Section: Basic Experiments Of Dmcnmentioning

confidence: 99%

Recognition of Online Handwritten Math Symbols Using Deep Neural Networks

Nguyen

Nakagawa

2016

IEICE Trans. Inf. & Syst.

View full text Add to dashboard Cite

SUMMARY This paper presents deep learning to recognize online handwritten mathematical symbols. Recently various deep learning architectures such as Convolution neural networks (CNNs), Deep neural networks (DNNs), Recurrent neural networks (RNNs) and Long short-term memory (LSTM) RNNs have been applied to fields such as computer vision, speech recognition and natural language processing where they have shown superior performance to state-of-the-art methods on various tasks. In this paper, max-out-based CNNs and Bidirectional LSTM (BLSTM) networks are applied to image patterns created from online patterns and to the original online patterns, respectively and then combined. They are compared with traditional recognition methods which are MRFs and MQDFs by recognition experiments on the CROHME database along with analysis and explanation.

show abstract

Investigation of maxout networks for speech recognition

Cited by 38 publications

References 25 publications

Systematic evaluation of convolution neural network advances on the Imagenet

Systematic evaluation of convolution neural network advances on the Imagenet

Phone recognition with hierarchical convolutional deep maxout networks

Recognition of Online Handwritten Math Symbols Using Deep Neural Networks

Contact Info

Product

Resources

About