2010
DOI: 10.1109/jproc.2010.2050411
|View full text |Cite
|
Sign up to set email alerts
|

I2T: Image Parsing to Text Description

Abstract: Abstract-In this paper, we present an image parsing to text description (I2T) framework that generates text descriptions of image and video content based on image understanding. The proposed I2T framework follows three steps: 1) Input images (or video frames) are decomposed into their constituent visual patterns by an image parsing engine, in a spirit similar to parsing sentences in natural language.2) The image parsing results are converted into semantic representation in the form of Web Ontology Language (OW… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
111
0
5

Year Published

2010
2010
2021
2021

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 246 publications
(116 citation statements)
references
References 69 publications
0
111
0
5
Order By: Relevance
“…Yao et al [26] look at the problem of generating text with a comprehensive system built on various hierarchical knowledge ontologies and using a human in the loop for hierarchical image parsing (except in specialized circumstances). In contrast, our work automatically mines knowledge about textual representation, and parses images fully automatically -without a human operator -and with a much simpler approach overall.…”
Section: Related Workmentioning
confidence: 99%
“…Yao et al [26] look at the problem of generating text with a comprehensive system built on various hierarchical knowledge ontologies and using a human in the loop for hierarchical image parsing (except in specialized circumstances). In contrast, our work automatically mines knowledge about textual representation, and parses images fully automatically -without a human operator -and with a much simpler approach overall.…”
Section: Related Workmentioning
confidence: 99%
“…Instead of humans and their activities, they focused on detection of objects, their inter-relations and events in videos. Yao et al presented their work on video-to-text description [6]; this work was dependent on a significant amount of annotated data, a requirement that is avoided in this paper. Yang et al developed a framework for static image to textual descriptions where they dealt with images with up to two objects [7].…”
Section: Related Workmentioning
confidence: 99%
“…This approach overcomes some of the limitations of Hidden Markov Models and Dynamic Bayesian Networks, because not only the model parameters are learned, but the model structures too. In [11], and-or graphs are used to generate text descriptions for a large dataset containing many different types of videos and images. Note that these last four studies are not just concerned with classifying activities.…”
Section: Related Workmentioning
confidence: 99%