A Thousand Words in a Scene

Quelhas, Pedro; Monay, Florent; Odobez, Jean-Marc; Gática-Pérez, Daniel; Tuytelaars, Tinne

doi:10.1109/tpami.2007.1155

Cited by 174 publications

(107 citation statements)

References 41 publications

(122 reference statements)

Supporting

Mentioning

107

Contrasting

Order By: Relevance

“…Topic models are widely applied in image classification [20]. The topic models are particularly effective when pairing with the BoW representation, where the models group ambiguous codewords together and generate a topic distribution over a codebook.…”

Section: Related Workmentioning

confidence: 99%

Enhanced Random Forest with Image/Patch-Level Learning for Image Understanding

Hoo

Kim

Pei

et al. 2014

2014 22nd International Conference on Pattern Recognition

View full text Add to dashboard Cite

Abstract-Image understanding is an important research domain in the computer vision due to its wide real-world applications. For an image understanding framework that uses the Bag-of-Words model representation, the visual codebook is an essential part. Random forest (RF) as a tree-structure discriminative codebook has been a popular choice. However, the performance of the RF can be degraded if the local patch labels are poorly assigned. In this paper, we tackle this problem by a novel way to update the RF codebook learning for a more discriminative codebook with the introduction of the soft class labels, estimated from the pLSA model based on a feedback scheme. The feedback scheme is performed on both the image and patch levels respectively, which is in contrast to the stateof-the-art RF codebook learning that focused on either image or patch level only. Experiments on 15-Scene and C-Pascal datasets had shown the effectiveness of the proposed method in image understanding task.

show abstract

Section: Related Workmentioning

confidence: 99%

Enhanced Random Forest with Image/Patch-Level Learning for Image Understanding

Hoo

Kim

Pei

et al. 2014

2014 22nd International Conference on Pattern Recognition

View full text Add to dashboard Cite

show abstract

“…(2) To cluster STIPs, K-means is used in the feature space of the interest points. Recently, semantic based clustering strategies are proposed to resolve the difficulties in selecting a proper K value for the K-means algorithm and the disagreement between appearance similarity and semantic consistency (Quelhas et al 2007). Based on Dollár's ST interest point detector, Niebles et al model actions using a bag-ofword model, and cluster the interest points by the underlying "topics" (Niebles et al 2008).…”

Section: Spatiotemporal Interest Point Based Approachesmentioning

confidence: 99%

Mining Layered Grammar Rules for Action Recognition

Wang

Gao

2010

Int J Comput Vis

View full text Add to dashboard Cite

We propose a layered-grammar model to represent actions. Using this model, an action is represented by a set of grammar rules. The bottom layer of an action instance's parse tree contains action primitives such as spatiotemporal (ST) interest points. At each layer above, we iteratively mine grammar rules and "super rules" that account for the high-order compositional feature structures. The grammar rules are categorized into three classes according to three different ST-relations of their action components, namely the strong relation, weak relation and stochastic relation. These ST-relations characterize different action styles (degree of stiffness), and they are pursued in terms of grammar rules for the purpose of action recognition. By adopting the Emerging Pattern (EP) mining algorithm for relation pursuit, the learned production rules are statistically significant and discriminative. Using the learned rules, the parse tree of an action video is constructed by combining a bottom-up rule detection step and a top-down ambiguous rule pruning step. An action instance is recognized based on the discriminative configurations generated by the pro- duction rules of its parse tree. Experiments confirm that by incorporating the high-order feature statistics, the proposed method largely improves the recognition performance over the bag-of-words models.

show abstract

“…Another similar part-based image represenations that are proposed recentlty are visterms [15,23,24], SIFT-bags [39] blobs [7], and VLAD [14] vector representation of an image which aggregates descriptors based on a locality criterion in the feature space. The different approach is the one proposed by Morand et al [21].…”

Section: Analogy Between Information Retrieval and Cbirmentioning

confidence: 99%

Toward a higher-level visual representation for content-based image retrieval

Sayad

Martinet

Urruty

et al. 2010

Proceedings of the 8th International Conference on Advances in Mobile Computing and Multimedia

View full text Add to dashboard Cite

Having effective methods to access the desired images is essential nowadays with the availability of a huge amount of digital images. The proposed approach is based on an analogy between content-based image retrieval and text retrieval. The aim of the approach is to build a meaningful mid-level representation of images to be used later on for matching between a query image and other images in the desired database. The approach is based firstly on constructing different visual words using local patch extraction and fusion of descriptors. Secondly, we introduce a new method using multilayer pLSA to eliminate the noisiest words generated by the vocabulary building process. Thirdly, a new spatial weighting scheme is introduced that consists of weighting visual words according to the probability of each visual word to belong to each of the n Gaussian. Finally, we construct visual phrases from groups of visual words that are involved in strong association rules. Experimental results show that our approach outperforms the results of traditional image retrieval techniques.

show abstract

A Thousand Words in a Scene

Cited by 174 publications

References 41 publications

Enhanced Random Forest with Image/Patch-Level Learning for Image Understanding

Enhanced Random Forest with Image/Patch-Level Learning for Image Understanding

Mining Layered Grammar Rules for Action Recognition

Toward a higher-level visual representation for content-based image retrieval

Contact Info

Product

Resources

About