Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence 2019
DOI: 10.24963/ijcai.2019/137
|View full text |Cite
|
Sign up to set email alerts
|

Densely Supervised Hierarchical Policy-Value Network for Image Paragraph Generation

Abstract: Image paragraph generation aims to describe an image with a paragraph in natural language. Compared to image captioning with a single sentence, paragraph generation provides more expressive and fine-grained description for storytelling. Existing approaches mainly optimize paragraph generator towards minimizing word-wise cross entropy loss, which neglects linguistic hierarchy of paragraph and results in ``sparse" supervision for generator learning. In this paper, we propose a novel Densely Supervised Hierarc… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
2
1

Relationship

0
8

Authors

Journals

citations
Cited by 13 publications
(15 citation statements)
references
References 8 publications
0
15
0
Order By: Relevance
“…We compare our HSGED(SLL) with several state-of-the-art models: Regions-Hierarchical [13], RTT-GAN [17], DCPG [5], HCAVP [46], DHPV [36], CAE-LSTM [34], TDPG [23] and CRL [22]. Among these methods, RTT-GAN, DCPG, Regions-Hierarchical, DHPV, Hierarchical CAVP and CAE-LSTM use HRNNs with different technique details.…”
Section: Comparing Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…We compare our HSGED(SLL) with several state-of-the-art models: Regions-Hierarchical [13], RTT-GAN [17], DCPG [5], HCAVP [46], DHPV [36], CAE-LSTM [34], TDPG [23] and CRL [22]. Among these methods, RTT-GAN, DCPG, Regions-Hierarchical, DHPV, Hierarchical CAVP and CAE-LSTM use HRNNs with different technique details.…”
Section: Comparing Methodsmentioning
confidence: 99%
“…Researchers also propose advanced techniques to refine the prototypical HRNN, e.g., generative models like GAN [17] or VAE [5] for stronger consistency; the trigram repetition penalty based sampling method for diversity [23]. Besides, dense sentencelevel rewards [36] and curiosity-driven reinforcement learning [22] are used for more robust training, all of which could also be applied in our proposed framework, HSGED. However, most of them are built without enough hierarchical constraints, so the qualities of the generated paragraphs are unsatisfactory.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Our GeoRic dataset consists of 29,038 images from the Geograph project website, with captions and location coordinates. We selected captions that are exactly one sentence long (multi-sentence caption generation, although a promising research direction (Mao et al, 2018;Wu et al, 2019), is not addressed in this work) and include at least one spatial expression, such as "near", "north of", "across", etc. (in order to ensure that the captions contain enough geographic referencing).…”
Section: The Georic Datasetmentioning
confidence: 99%
“…Besides, dense sentence-level rewards [49] and curiosity-driven reinforcement learning [50] are used for more robust training, all of which could also be applied in our proposed framework, HSGED. However, most of them are built without enough 2.3.…”
Section: Image Paragraph Captioningmentioning
confidence: 99%