Convolutional Neural Network Language Models

Pham, Ngoc-Quan; Kruszewski, Germán; Boleda, Gemma

doi:10.18653/v1/d16-1123

Cited by 35 publications

(28 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Other methods for analyzing NLP models include (i) inspecting the mechanisms a model uses to encode information, e.g. attention weights (Voita et al, 2018;Raganato and Tiedemann, 2018;Voita et al, 2019b;Clark et al, 2019;Kovaleva et al, 2019) or individual neurons (Karpathy et al, 2015;Pham et al, 2016;Bau et al, 2019), (ii) looking at model predictions using manually defined templates, either evaluating sensitivity to specific grammatical errors (Linzen et al, 2016;Gulordava et al, 2018;Tran et al, 2018;Marvin and Linzen, 2018) or understanding what language models know when applying them as knowledge bases or in QA settings (Radford et al, 2019;Petroni et al, 2019;Poerner et al, 2019;Jiang et al, 2019). An information-theoretic view on analysis of NLP models has been previously attempted in Voita et al (2019a) when explaining how representations in the Transformer evolve between layers under different training objectives.…”

Section: Related Workmentioning

confidence: 99%

Information-Theoretic Probing with Minimum Description Length

Voita

Titov

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

163

208

View full text Add to dashboard Cite

To measure how well pretrained representations encode some linguistic property, it is common to use accuracy of a probe, i.e. a classifier trained to predict the property from the representations. Despite widespread adoption of probes, differences in their accuracy fail to adequately reflect differences in representations. For example, they do not substantially favour pretrained representations over randomly initialized ones. Analogously, their accuracy can be similar when probing for genuine linguistic labels and probing for random synthetic tasks. To see reasonable differences in accuracy with respect to these random baselines, previous work had to constrain either the amount of probe training data or its model size. Instead, we propose an alternative to the standard probes, information-theoretic probing with minimum description length (MDL). With MDL probing, training a probe to predict labels is recast as teaching it to effectively transmit the data. Therefore, the measure of interest changes from probe accuracy to the description length of labels given representations. In addition to probe quality, the description length evaluates 'the amount of effort' needed to achieve the quality. This amount of effort characterizes either (i) size of a probing model, or (ii) the amount of data needed to achieve the high quality. We consider two methods for estimating MDL which can be easily implemented on top of the standard probing pipelines: variational coding and online coding. We show that these methods agree in results and are more informative and stable than the standard probes. 1

show abstract

Section: Related Workmentioning

confidence: 99%

Information-Theoretic Probing with Minimum Description Length

Voita

Titov

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

163

208

View full text Add to dashboard Cite

show abstract

“…Meng et al (2015) and (Tu et al, 2015) applied convolutional models to score phrase-pairs of traditional phrasebased and dependency-based translation models. Convolutional architectures have also been successful in language modeling but so far failed to outperform LSTMs (Pham et al, 2016).…”

Section: Related Workmentioning

confidence: 99%

A Convolutional Encoder Model for Neural Machine Translation

Gehring

Auli

Grangier

et al. 2017

Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers)

507

343

View full text Add to dashboard Cite

The prevalent approach to neural machine translation relies on bi-directional LSTMs to encode the source sentence. We present a faster and simpler architecture based on a succession of convolutional layers. This allows to encode the source sentence simultaneously compared to recurrent networks for which computation is constrained by temporal dependencies. On WMT'16 EnglishRomanian translation we achieve competitive accuracy to the state-of-the-art and on WMT'15 English-German we outperform several recently published results. Our models obtain almost the same accuracy as a very deep LSTM setup on WMT'14 English-French translation. We speed up CPU decoding by more than two times at the same or higher accuracy as a strong bidirectional LSTM. 1

show abstract

“…Pham et al [20] used CNN as a language model based on Feed Forward Neural Network (FFNN). Experimental results demonstrated that the performance of the CNN language model is better than the normal FFNN.…”

Section: Background and Prior Researchmentioning

confidence: 99%

A Neural Network Based Intelligent Support Model for Program Code Completion

Rahman

Watanobe

Nakamura

2020

Scientific Programming

View full text Add to dashboard Cite

In recent years, millions of source codes are generated in different languages on a daily basis all over the world. A deep neural network-based intelligent support model for source code completion would be a great advantage in software engineering and programming education fields. Vast numbers of syntax, logical, and other critical errors that cannot be detected by normal compilers continue to exist in source codes, and the development of an intelligent evaluation methodology that does not rely on manual compilation has become essential. Even experienced programmers often find it necessary to analyze an entire program in order to find a single error and are thus being forced to waste valuable time debugging their source codes. With this point in mind, we proposed an intelligent model that is based on long short-term memory (LSTM) and combined it with an attention mechanism for source code completion. Thus, the proposed model can detect source code errors with locations and then predict the correct words. In addition, the proposed model can classify the source codes as to whether they are erroneous or not. We trained our proposed model using the source code and then evaluated the performance. All of the data used in our experiments were extracted from Aizu Online Judge (AOJ) system. The experimental results obtained show that the accuracy in terms of error detection and prediction of our proposed model approximately is 62% and source code classification accuracy is approximately 96% which outperformed a standard LSTM and other state-of-the-art models. Moreover, in comparison to state-of-the-art models, our proposed model achieved an interesting level of success in terms of error detection, prediction, and classification when applied to long source code sequences. Overall, these experimental results indicate the usefulness of our proposed model in software engineering and programming education arena.

show abstract

Convolutional Neural Network Language Models

Cited by 35 publications

References 20 publications

Information-Theoretic Probing with Minimum Description Length

Information-Theoretic Probing with Minimum Description Length

A Convolutional Encoder Model for Neural Machine Translation

A Neural Network Based Intelligent Support Model for Program Code Completion

Contact Info

Product

Resources

About