2002
DOI: 10.1007/3-540-47979-1_7
|View full text |Cite
|
Sign up to set email alerts
|

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

Abstract: We describe a model of object recognition as machine translation. In this model, recognition is a process of annotating image regions with words. Firstly, images are segmented into regions, which are classified into region types using a variety of features. A mapping between region types and keywords supplied with the images, is then learned, using a method based around EM. This process is analogous with learning a lexicon from an aligned bitext. For the implementation we describe, these words are nouns taken … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

7
1,334
1
7

Year Published

2008
2008
2014
2014

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 1,158 publications
(1,372 citation statements)
references
References 2 publications
7
1,334
1
7
Order By: Relevance
“…The mapping of the descriptors to discrete indexes is performed according to a codebook C, which is typically learned from the local descriptors of the training images through kmeans clustering (Duygulu et al, 2002, Jeon and Manmatha, 2004, Quelhas et al, 2005. The assignment of the weight p i of visterm i in image p is as follows:…”
Section: Image Representationmentioning
confidence: 99%
“…The mapping of the descriptors to discrete indexes is performed according to a codebook C, which is typically learned from the local descriptors of the training images through kmeans clustering (Duygulu et al, 2002, Jeon and Manmatha, 2004, Quelhas et al, 2005. The assignment of the weight p i of visterm i in image p is as follows:…”
Section: Image Representationmentioning
confidence: 99%
“…These are Corel5k [4], ESP Game [20] and IAPRTC-12 [7]. While Corel-5k has become the de-facto dataset in this domain, the other two datasets are very challenging with significant diversity among their samples.…”
Section: Datasets and Featuresmentioning
confidence: 99%
“…cars, tracks, prototype grass, flowers, petals sky, grass, plane, lion tree, grass, tiger, park Figure 1: Example images from the Corel-5k dataset [4] and corresponding ground-truth labels. First image is an example of incomplete-labeling (tagged with "car" but not with "vehicle"); second image is an example of label-ambiguity (tagged with "flowers", though "blooms" would also have been equally correct); & third and fourth images are examples of structural-overlap ("lion" and "tiger" are two different but structurally related labels).…”
Section: Introductionmentioning
confidence: 99%
See 2 more Smart Citations