Very Deep Convolutional Networks for Text Classification

Conneau, Alexis; Schwenk, Holger; Barrault, Loïc; LeCun, Yann

doi:10.18653/v1/e17-1104

Cited by 776 publications

(594 citation statements)

References 18 publications

Supporting

Mentioning

581

Contrasting

Unclassified

Order By: Relevance

“…Fast Fourier Transform (FFT) provides an alternative approach to calculate the correlation coefficient with a high computational speed as compared to Equation (6) [65,66]. The correlation coefficient between A and B is computed by locating the maximum value of the following equation:…”

Section: Correlation Coefficientmentioning

confidence: 99%

See 1 more Smart Citation

A Framework for Designing the Architectures of Deep Convolutional Neural Networks

Albelwi

Mahmood

2017

Entropy

192

View full text Add to dashboard Cite

Recent advances in Convolutional Neural Networks (CNNs) have obtained promising results in difficult deep learning tasks. However, the success of a CNN depends on finding an architecture to fit a given problem. A hand-crafted architecture is a challenging, time-consuming process that requires expert knowledge and effort, due to a large number of architectural design choices. In this article, we present an efficient framework that automatically designs a high-performing CNN architecture for a given problem. In this framework, we introduce a new optimization objective function that combines the error rate and the information learnt by a set of feature maps using deconvolutional networks (deconvnet). The new objective function allows the hyperparameters of the CNN architecture to be optimized in a way that enhances the performance by guiding the CNN through better visualization of learnt features via deconvnet. The actual optimization of the objective function is carried out via the Nelder-Mead Method (NMM). Further, our new objective function results in much faster convergence towards a better architecture. The proposed framework has the ability to explore a CNN architecture's numerous design choices in an efficient way and also allows effective, distributed execution and synchronization via web services. Empirically, we demonstrate that the CNN architecture designed with our approach outperforms several existing approaches in terms of its error rate. Our results are also competitive with state-of-the-art results on the MNIST dataset and perform reasonably against the state-of-the-art results on CIFAR-10 and CIFAR-100 datasets. Our approach has a significant role in increasing the depth, reducing the size of strides, and constraining some convolutional layers not followed by pooling layers in order to find a CNN architecture that produces a high recognition performance.

show abstract

Section: Correlation Coefficientmentioning

confidence: 99%

“…Deep convolutional neural networks (CNNs) recently have shown remarkable success in a variety of areas such as computer vision [1][2][3] and natural language processing [4][5][6]. CNNs are biologically inspired by the structure of mammals' visual cortexes as presented in Hubel and Wiesel's model [7].…”

Section: Introductionmentioning

confidence: 99%

A Framework for Designing the Architectures of Deep Convolutional Neural Networks

Albelwi

Mahmood

2017

Entropy

192

View full text Add to dashboard Cite

show abstract

“…37 Deep learning model architecture are designed based on the learning task, number of 38 the parameters and size of the dataset. Well-known deep learning models, e.g., ResNet 39 and VGGNet, from computer vision [2] have been reused to build advanced systems for 40 text processing such as Very Deep Convolution Network (VDCNN) c [5] operating at 41 character level directly. Text modeling and sentence classification have been also tackled 42 with a small number of convolution layers such as one layer, two layers and six 43 layers [6][7][8].…”

mentioning

confidence: 99%

On the Depth of Deep Learning Models for Splice Site Identification

Elsousy

Kathiresan

Boughorbel

2018

Preprint

View full text Add to dashboard Cite

The success of deep learning has been shown in various fields including computer vision, speech recognition, natural language processing and bioinformatics. The advance of Deep Learning in Computer Vision has been an important source of inspiration for other research fields. The objective of this work is to adapt known deep learning models borrowed from computer vision such as VGGNet, Resnet and AlexNet for the classification of biological sequences. In particular, we are interested by the task of splice site identification based on raw DNA sequences. We focus on the role of model architecture depth on model training and classification performance.We show that deep learning models outperform traditional classification methods (SVM, Random Forests, and Logistic Regression) for large training sets of raw DNA sequences. Three model families are analyzed in this work namely VGGNet, AlexNet and ResNet. Three depth levels are defined for each model family. The models are benchmarked using the following metrics: Area Under ROC curve (AUC), Number of model parameters, number of floating operations. Our extensive experimental evaluation show that shallow architectures have an overall better performance than deep models. We introduced a shallow version of ResNet, named S-ResNet. We show that it gives a good trade-off between model complexity and classification performance. Author summaryDeep Learning has been widely applied to various fields in research and industry. It has 1 been also succesfully applied to genomics and in particular to splice site identification. 2We are interested in the use of advanced neural networks borrowed from computer

show abstract

“…The character level CNN model of with six convolutional layers was outperformed by a bag-of-words model for three out of four data sets for topic classification. The 29 layer model of Conneau et al (2016) improves the results; however, only for two out of the four data sets, the CNN performs better than the bag-of-words model. Wang et al (2015) developed a CNN with one convolutional layer and a layer that extracts several representations of the texts by applying multiple windows with various width over the pre-trained word embeddings.…”

Section: Topic and Question Classificationmentioning

confidence: 90%

“…For one of the data sets, a bag-of-words model outperformed the character CNN, while for the other three data sets, the CNN showed better performance. Conneau et al (2016) showed that the performance is further improved with a CNN containing 29 convolutional layers. This CNN also performs better than the bag-of-word model for the data set for which the bag-of-words model outperformed the smaller CNN.…”

Very Deep Convolutional Networks for Text Classification

Cited by 776 publications

References 18 publications

A Framework for Designing the Architectures of Deep Convolutional Neural Networks

A Framework for Designing the Architectures of Deep Convolutional Neural Networks

On the Depth of Deep Learning Models for Splice Site Identification

Text mining to detect indications of fraud in annual reports worldwide

Contact Info

Product

Resources

About