Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics 2019
DOI: 10.18653/v1/p19-1180
|View full text |Cite
|
Sign up to set email alerts
|

Visually Grounded Neural Syntax Acquisition

Abstract: We present the Visually Grounded Neural Syntax Learner (VG-NSL), an approach for learning syntactic representations and structures without explicit supervision. The model learns by looking at natural images and reading paired captions. VG-NSL generates constituency parse trees of texts, recursively composes representations for constituents, and matches them with images. We define the concreteness of constituents by their matching scores with images, and use it to guide the parsing of text. Experiments on the M… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

2
91
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
7
2

Relationship

0
9

Authors

Journals

citations
Cited by 64 publications
(93 citation statements)
references
References 55 publications
2
91
0
Order By: Relevance
“…Shen et al (2018Shen et al ( , 2019 learn tree structures through soft gating layers within neural language models, while Drozdov et al (2019) combine recursive autoencoders with the inside-outside algorithm. Kim et al (2019) train unsupervised recurrent neural network grammars with a structured inference network to induce latent trees, and Shi et al (2019) utilize image captions to identify and ground constituents.…”
Section: Related Workmentioning
confidence: 99%
“…Shen et al (2018Shen et al ( , 2019 learn tree structures through soft gating layers within neural language models, while Drozdov et al (2019) combine recursive autoencoders with the inside-outside algorithm. Kim et al (2019) train unsupervised recurrent neural network grammars with a structured inference network to induce latent trees, and Shi et al (2019) utilize image captions to identify and ground constituents.…”
Section: Related Workmentioning
confidence: 99%
“…The next step is to derive the query representation based on the recursively extracted constituent nodes in the LST. In previous work [4,36], only the last constituent node is used for task-specific inference. However, as mentioned previously, the complex query usually consists of multiple visual concepts and their reference descriptions, in which some concepts or reference descriptions may not have clear visual evidence or just have very short temporal durations in the videos.…”
Section: Tree-augmented Query Encodermentioning
confidence: 99%
“…Kim et al (2019b) employ unsupervised recurrent neural network grammars, and Kim et al (2019a) employ compound probabilistic context free grammars. Shi et al (2019) show how image captions can be successfully leveraged to identify constituents in sentences. None of these papers performs an explicit analysis of differences between languages.…”
Section: Related Workmentioning
confidence: 99%