Chong-Wah Ngo scite author profile

Based on keypoints extracted as salient image patches, an image can be described as a "bag of visual words" and this representation has been used in scene classification. The choice of dimension, selection, and weighting of visual words in this representation is crucial to the classification performance but has not been thoroughly studied in previous work. Given the analogy between this representation and the bag-of-words representation of text documents, we apply techniques used in text categorization, including term weighting, stop word removal, feature selection, to generate image representations that differ in the dimension, selection, and weighting of visual words. The impact of these representation choices to scene classification is studied through extensive experiments on the TRECVID and PASCAL collection. This study provides an empirical basis for designing visual-word representations that are likely to produce superior classification performance.

show abstract

Towards optimal bag-of-features for object categorization and semantic video retrieval

Jiang

2007

View full text Add to dashboard Cite

Practical elimination of near-duplicates from web video search

2007

View full text Add to dashboard Cite

Current web video search results rely exclusively on text keywords or user-supplied tags. A search on typical popular video often returns many duplicate and near-duplicate videos in the top results. This paper outlines ways to cluster and filter out the nearduplicate video using a hierarchical approach. Initial triage is performed using fast signatures derived from color histograms. Only when a video cannot be clearly classified as novel or nearduplicate using global signatures, we apply a more expensive local feature based near-duplicate detection which provides very accurate duplicate analysis through more costly computation. The results of 24 queries in a data set of 12,790 videos retrieved from Google, Yahoo! and YouTube show that this hierarchical approach can dramatically reduce redundant video displayed to the user in the top result set, at relatively small computational cost.

show abstract

Trajectory-Based Modeling of Human Actions with Motion Reference Points

et al. 2012

View full text Add to dashboard Cite

Abstract. Human action recognition in videos is a challenging problem with wide applications. State-of-the-art approaches often adopt the popular bag-of-features representation based on isolated local patches or temporal patch trajectories, where motion patterns like object relationships are mostly discarded. This paper proposes a simple representation specifically aimed at the modeling of such motion relationships. We adopt global and local reference points to characterize motion information, so that the final representation can be robust to camera movement. Our approach operates on top of visual codewords derived from local patch trajectories, and therefore does not require accurate foreground-background separation, which is typically a necessary step to model object relationships. Through an extensive experimental evaluation, we show that the proposed representation offers very competitive performance on challenging benchmark datasets, and combining it with the bag-of-features representation leads to substantial improvement. On Hollywood2, Olympic Sports, and HMDB51 datasets, we obtain 59.5%, 80.6% and 40.7% respectively, which are the best reported results to date.

show abstract

Semi-supervised Domain Adaptation with Subspace Learning for visual recognition

et al. 2015

View full text Add to dashboard Cite

In many real-world applications, we are often facing the problem of cross domain learning, i.e., to borrow the labeled data or transfer the already learnt knowledge from a source domain to a target domain. However, simply applying existing source data or knowledge may even hurt the performance, especially when the data distribution in the source and target domain is quite different, or there are very few labeled data available in the target domain. This paper proposes a novel domain adaptation framework, named Semi-supervised Domain Adaptation with Subspace Learning (SDASL), which jointly explores invariant lowdimensional structures across domains to correct data distribution mismatch and leverages available unlabeled target examples to exploit the underlying intrinsic information in the target domain. Specifically, SDASL conducts the learning by simultaneously minimizing the classification error, preserving the structure within and across domains, and restricting similarity defined on unlabeled target examples. Encouraging results are reported for two challenging domain transfer tasks (including image-to-image and imageto-video transfers) on several standard datasets in the context of both image object recognition and video concept detection.

show abstract

Fast tracking of near-duplicate keyframes in broadcast domain with transitivity propagation

Ngo

Zhao

Jiang

2006

View full text Add to dashboard Cite

The identification of near-duplicate keyframe (NDK) pairs is a useful task for a variety of applications such as news story threading and content-based video search. In this paper, we propose a novel approach for the discovery and tracking of NDK pairs and threads in the broadcast domain. The detection of NDKs in a large data set is a challenging task due to the fact that when the data set increases linearly, the computational cost increases in a quadratic speed, and so does the number of false alarms. This paper explores the symmetric and transitive nature of near-duplicate for the effective detection and fast tracking of NDK pairs based upon the matching of local keypoints in frames. In the detection phase, we propose a robust measure, namely pattern entropy (PE), to measure the coherency of symmetric keypoint matching across the space of two keyframes. This measure is shown to be effective in discovering the NDK identity of a frame. In the tracking phase, the NDK pairs and threads are rapidly propagated and linked with transitivity without the need of detection. This step ends up with a significant boost in speed efficiency. We evaluate our proposed approach against a month of the TRECVID-2004 broadcast videos. The experimental results indicate that our approach outperforms other techniques in terms of recall and precision with a large margin. In addition, by considering the transitivity and the underlying distribution of NDK pairs along time span, a speed-up of 3 to 5 times is achieved when keeping the performance close enough to the optimal one obtained by exhaustive evaluation.

show abstract

Event driven summarization for web videos

Hong

Tang

Tan

et al. 2009

View full text Add to dashboard Cite

The explosive growth of web videos brings out the challenge of how to efficiently browse hundreds or even thousands of videos at a glance. Given an event-driven query, social media web sites can easily return a ranked list of large but diverse and somewhat noisy videos. Users often need to painstakingly explore the retrieved list for an overview of the event. This paper presents a novel solution by mining and threading "key" shots, which can provide an overview of main contents of videos at a glance, by summarizing a large set of diverse videos. The proposed framework comprises three stages for multi-video summarization. Firstly, given an event query, a ranked list of web videos together with their associated tags are retrieved. Key shots are then established by near-duplicate keyframe detection, ranked according to informativeness and threaded in a chronological order. Finally, summarization is formulated as an optimization procedure which trades off between relevance of key shots and user-defined skimming ratio. The framework provides the summary with the way of dynamic video skimming. We conduct user studies on twelve event queries for over hundred hours of videos crawled from YouTube. The evaluation demonstrates the feasibility and effectiveness of the proposed solution.

show abstract

Sampling and Ontologically Pooling Web Images for Visual Concept Learning

Zhu

Ngo

Jiang

2012

IEEE Trans. Multimedia

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chong-Wah Ngo

Evaluating bag-of-visual-words representations in scene classification

Towards optimal bag-of-features for object categorization and semantic video retrieval

Practical elimination of near-duplicates from web video search

Trajectory-Based Modeling of Human Actions with Motion Reference Points

Semi-supervised Domain Adaptation with Subspace Learning for visual recognition

Fast tracking of near-duplicate keyframes in broadcast domain with transitivity propagation

Event driven summarization for web videos

Sampling and Ontologically Pooling Web Images for Visual Concept Learning

Contact Info

Product

Resources

About