2016
DOI: 10.1007/978-3-319-46448-0_8
|View full text |Cite
|
Sign up to set email alerts
|

Semantic Object Parsing with Graph LSTM

Abstract: By taking the semantic object parsing task as an exemplar application scenario, we propose the Graph Long Short-Term Memory (Graph LSTM) network, which is the generalization of LSTM from sequential data or multidimensional data to general graph-structured data. Particularly, instead of evenly and fixedly dividing an image to pixels or patches in existing multi-dimensional LSTM structures (e.g., Row, Grid and Diagonal LSTMs [1][2]), we take each arbitrary-shaped superpixel as a semantically consistent node, and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

1
197
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
3
2

Relationship

1
8

Authors

Journals

citations
Cited by 299 publications
(198 citation statements)
references
References 49 publications
1
197
0
Order By: Relevance
“…Recurrent Neural Networks (RNN) have been widely used to address the sequential prediction problems in literatures, such as speech recognition [9], human action/activity recognition [34], [41], scene labeling [39], [45], image caption [19], and object segmentation [31]. RNN and its variants LSTM [34], GRNN [4], etc.…”
Section: Related Workmentioning
confidence: 99%
“…Recurrent Neural Networks (RNN) have been widely used to address the sequential prediction problems in literatures, such as speech recognition [9], human action/activity recognition [34], [41], scene labeling [39], [45], image caption [19], and object segmentation [31]. RNN and its variants LSTM [34], GRNN [4], etc.…”
Section: Related Workmentioning
confidence: 99%
“…Hierarchical/Graphical Models in Computer Vision: Hierarchical/graphical models are powerful for building structured representations, which can reflect task-specific relations and constraints. From early distributional semantic models, part-based models [16,17], MRF/CRF [31], And-Or grammar model [59], to deep structural networks [30,15], graph neural networks [20], trainable CRF [79], etc., hierarchical/graphical models have found applications in a wide variety of core computer vision tasks, such as object recognition [55], human parsing [40,41,81], pose estimation [34,66,61,68,35], visual dialog etc., to the extent that they are now ubiquitous in the field. Inspired by their general success, we leverage structural information to design our approach.…”
Section: Related Workmentioning
confidence: 99%
“…With the renaissance of connectionism in the computer vision community, recent research efforts take deep neural networks as their main building blocks [70,54,78,50,49,77,45]. More specifically, some efforts address the task as an active template regression problem [39], propagate semantic information from a retrieved, annotated image corpus [44], merge multi-level image context in a unified convolutional neural network [42], or use Graph LSTMs to model human configurations [40,41]. Some others leverage extra pose information to assist the task [72,22,71,14,54].…”
Section: Related Workmentioning
confidence: 99%
“…Human-image parsing is a specific semantic object segmentation task for which various CNN-based methods have been proposed [15][16][17][18][19][20]. In particular, some CNN-based methods use training datasets with different domains.…”
Section: Related Workmentioning
confidence: 99%