2007
DOI: 10.1109/tpami.2007.1155
|View full text |Cite
|
Sign up to set email alerts
|

A Thousand Words in a Scene

Abstract: This paper presents a novel approach for visual scene modeling and classification, investigating the combined use of text modeling methods and local invariant features. Our work attempts to elucidate (1) whether a text-like bag-of-visterms representation (histogram of quantized local visual features) is suitable for scene (rather than object) classification, (2) whether some analogies between discrete scene representations and text documents exist, and (3) whether unsupervised, latent space models can be used … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
107
0

Year Published

2010
2010
2021
2021

Publication Types

Select...
5
3
2

Relationship

1
9

Authors

Journals

citations
Cited by 174 publications
(107 citation statements)
references
References 41 publications
(122 reference statements)
0
107
0
Order By: Relevance
“…Topic models are widely applied in image classification [20]. The topic models are particularly effective when pairing with the BoW representation, where the models group ambiguous codewords together and generate a topic distribution over a codebook.…”
Section: Related Workmentioning
confidence: 99%
“…Topic models are widely applied in image classification [20]. The topic models are particularly effective when pairing with the BoW representation, where the models group ambiguous codewords together and generate a topic distribution over a codebook.…”
Section: Related Workmentioning
confidence: 99%
“…(2) To cluster STIPs, K-means is used in the feature space of the interest points. Recently, semantic based clustering strategies are proposed to resolve the difficulties in selecting a proper K value for the K-means algorithm and the disagreement between appearance similarity and semantic consistency (Quelhas et al 2007). Based on Dollár's ST interest point detector, Niebles et al model actions using a bag-ofword model, and cluster the interest points by the underlying "topics" (Niebles et al 2008).…”
Section: Spatiotemporal Interest Point Based Approachesmentioning
confidence: 99%
“…Another similar part-based image represenations that are proposed recentlty are visterms [15,23,24], SIFT-bags [39] blobs [7], and VLAD [14] vector representation of an image which aggregates descriptors based on a locality criterion in the feature space. The different approach is the one proposed by Morand et al [21].…”
Section: Analogy Between Information Retrieval and Cbirmentioning
confidence: 99%