Learning language through pictures

Chrupała, Grzegorz; Kádár, Ákos; Alishahi, Afra

doi:10.3115/v1/p15-2019

Cited by 52 publications

(58 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This surprising result is largely due to the fact that the translators did not see the images while providing ground truth translations. More importantly, the effectiveness of visual information in machine translation in a privileged setting is also intuitive following the results of [5]. Chrupala et al [5] show that when image information is used as privileged information in the learning of word representations, the quality of such representations increases.…”

Section: Image Classification With Privileged Localizationmentioning

confidence: 84%

“…Learning Language under Privileged Visual Information: Using images as privileged information to learn language is not new. Chrupala et al [5] used a multi-task loss while learning word embeddings under privileged visual information. The embeddings are trained for the task of predicting the next word, as well the representation of the image.…”

Section: Related Workmentioning

confidence: 99%

“…The embeddings are trained for the task of predicting the next word, as well the representation of the image. Analysis of this model [5,19] suggests that the embeddings learned by using vision as a privileged information are significantly different than language only ones and correlate better with human judgments. Recently, Elliott et al [8] collected a dataset of images with English captions as well as German translations of captions.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Deep Learning Under Privileged Information Using Heteroscedastic Dropout

Lambert

Şener

Savarese

2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

View full text Add to dashboard Cite

Unlike machines, humans learn through rapid, abstract model-building. The role of a teacher is not simply to hammer home right or wrong answers, but rather to provide intuitive comments, comparisons, and explanations to a pupil. This is what the Learning Under Privileged Information (LUPI) paradigm endeavors to model by utilizing extra knowledge only available during training. We propose a new LUPI algorithm specifically designed for Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). We propose to use a heteroscedastic dropout (i.e. dropout with a varying variance) and make the variance of the dropout a function of privileged information. Intuitively, this corresponds to using the privileged information to control the uncertainty of the model output. We perform experiments using CNNs and RNNs for the tasks of image classification and machine translation. Our method significantly increases the sample efficiency during learning, resulting in higher accuracy with a large margin when the number of training examples is limited. We also theoretically justify the gains in sample efficiency by providing a generalization error bound decreasing with O( 1 n ), where n is the number of training examples, in an oracle case.

show abstract

Section: Image Classification With Privileged Localizationmentioning

confidence: 84%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Deep Learning Under Privileged Information Using Heteroscedastic Dropout

Lambert

Şener

Savarese

2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

View full text Add to dashboard Cite

show abstract

“…Roy and Pentland, 2002;Yu and Ballard, 2004;Lazaridou et al, 2016). Chrupała et al (2015) introduce a model that learns to predict the visual context from image captions. The model is trained on image-caption pairs from MSCOCO (Lin et al, 2014), capturing both rich visual input as well as larger scale input, but the language input still consists of word symbols.…”

Section: Related Workmentioning

confidence: 99%

Representations of language in a model of visually grounded speech signal

Chrupała

Gelderloos

Alishahi

2017

Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers)

Self Cite

105

178

View full text Add to dashboard Cite

We present a visually grounded model of speech perception which projects spoken utterances and images to a joint semantic space. We use a multi-layer recurrent highway network to model the temporal nature of spoken speech, and show that it learns to extract both form and meaningbased linguistic knowledge from the input signal. We carry out an in-depth analysis of the representations used by different components of the trained model and show that encoding of semantic aspects tends to become richer as we go up the hierarchy of layers, whereas encoding of formrelated aspects of the language input tends to initially increase and then plateau or decrease.

show abstract

“…The challenge with textual data is the discrete nature of the input: they use the Gumbel softmax trick (Jang et al 2017) to generate word sequences which maximize activations for particular neurons. They apply this method to the Imaginet architecture of Chrupała et al (2015) and confirm one of the findings in Kádár et al (2017): that the language model part of the Imaginet architecture is more sensitive to function words than the visual part, which tends to ignore them. They also carry out a separate quantitative evaluation of the synthetic patterns vs corpus-attested in terms of achieved maximum activation.…”

Section: Saliency In Recurrent Networkmentioning

confidence: 55%

Analyzing and interpreting neural networks for NLP: A report on the first BlackboxNLP workshop

2019

Self Cite

View full text Add to dashboard Cite

The EMNLP 2018 workshop BlackboxNLP was dedicated to resources and techniques specifically developed for analyzing and understanding the inner-workings and representations acquired by neural models of language. Approaches included: systematic manipulation of input to neural networks and investigating the impact on their performance, testing whether interpretable knowledge can be decoded from intermediate representations acquired by neural networks, proposing modifications to neural network architectures to make their knowledge state or generated output more explainable, and examining the performance of networks on simplified or formal languages. Here we review a number of representative studies in each category.

show abstract

Learning language through pictures

Cited by 52 publications

References 34 publications

Deep Learning Under Privileged Information Using Heteroscedastic Dropout

Deep Learning Under Privileged Information Using Heteroscedastic Dropout

Representations of language in a model of visually grounded speech signal

Analyzing and interpreting neural networks for NLP: A report on the first BlackboxNLP workshop

Contact Info

Product

Resources

About