Practical elimination of near-duplicates from web video search

Wu, Xiao; Hauptmann, Alexander G.; Ngo, Chong-Wah

doi:10.1145/1291233.1291280

Cited by 306 publications

(345 citation statements)

References 37 publications

Supporting

Mentioning

336

Contrasting

Unclassified

Order By: Relevance

“…The dictionary is generated by clustering 1,185,698 keypoints extracted from 3,000 keyframes randomly sampled from the dataset of [1]. We employ DoG [6] for keypoint detection and PSIFT [9] for feature description.…”

Section: Visual Keywordmentioning

confidence: 99%

“…A rough statistic, as indicated in [1], shows that more than 65, 000 videos have been uploaded on video sharing web site YouTube daily. It is believed that this number is still increasing with fast speed.…”

Section: Introductionmentioning

confidence: 99%

“…These videos were collected from YouTube, Google and Yahoo!. The first time we collected the videos was in year 2006 [1], where these…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Large-scale near-duplicate web video search: Challenge and opportunity

Zhao

Tan

Ngo

2009

2009 IEEE International Conference on Multimedia and Expo

Self Cite

View full text Add to dashboard Cite

The massive amount of near-duplicate and duplicate web videos has presented both challenge and opportunity to multimedia computing. On one hand, browsing videos on Internet becomes highly inefficient for the need to repeatedly fast-forward videos of similar content. On the other hand, the tremendous amount of somewhat duplicate content also makes some traditionally difficult vision tasks become simple and easy. For example, annotating pictures can be as simple as recycling the tags of Internet images retrieved from image search engines. Such tasks, of either to eliminate or to recycle near-duplicates, can usually be achieved by the nearest neighbor search of videos from Internet. The fundamental problem lies on the scalability of a search technique, in face of the intractable volume of videos which keep rolling on the web. In this paper, we investigate scalability of several well-known features including color signature and visual keywords for web-based retrieval. Indexing these features based on embedding technique for scalable retrieval is also presented. On an Internet video dataset of more than 700 hours collected during years 2006 to 2008, we show some preliminary insights to the challenge of scalable retrieval.

show abstract

Section: Visual Keywordmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Large-scale near-duplicate web video search: Challenge and opportunity

Zhao

Tan

Ngo

2009

2009 IEEE International Conference on Multimedia and Expo

Self Cite

View full text Add to dashboard Cite

show abstract

“…In [15], near-duplicate frames are identified in the TRECVID 2004 video corpus, but 16 seconds are required for searching 150 frames in 10 minutes of video (600 frames). An application of CBVCD to the elimination of video duplicates in Web search is proposed in [14]; global descriptors help separating the least similar videos, then local descriptors allow to refine duplicate detection. However, several minutes are required for returning the top 10 answers (among the 600 preliminary results) to a keyword-based query, which is too long for performing the query-dependent online processing we need.…”

Section: Content-based Video Copy Detection For Video Miningmentioning

confidence: 99%

Fast Content-Based Mining of Web2.0 Videos

Poullot

Crucianu

Buisson

2008

Advances in Multimedia Information Processing - PCM 2008

View full text Add to dashboard Cite

Abstract. The accumulation of many transformed versions of the same original videos on Web2.0 sites has a negative impact on the quality of the results presented to the users and on the management of content by the provider. An automatic identification of such content links between video sequences can address these difficulties. We put forward a fast solution to this video mining problem, relying on a compact keyframe descriptor and an adapted indexing solution. Two versions are developed, an off-line one for mining large databases and an online one to quickly post-process the results of keyword-based interactive queries. After demonstrating the reliability of the method on a ground truth, the scalability on a database of 10,000 hours of video and the speed on 3 interactive queries, some results obtained on Web2.0 content are illustrated.

show abstract

“…Therefore, an important problem now faced by these video sharing sites is how to automatically perform accurate and fast similarity search for an incoming video clip against its huge database, to avoid copyright violation. Meanwhile, since the retrieval efficiency will be hampered if a large number of search results are essentially almost-identical, database purge also contributes to high-quality ranking for video search results [34].…”

Section: Introductionmentioning

confidence: 99%

Challenges and techniques for effective and efficient similarity search in large video databases

2008

View full text Add to dashboard Cite

Searching relevant visual information based on content features in large databases is an interesting and changeling topic that has drawn lots of attention from both the research community and industry. This paper gives an overview of our investigations on effective and efficient video similarity search. We briefly introduce some novel techniques developed for two specific tasks studied in this PhD project: video retrieval in a large collection of segmented video clips, and video subsequence identification from a long unsegmented stream. The proposed methods for processing these two types of similarity queries have shown encouraging performance and are being incorporated into our prototype system of video search named UQLIPS, which has demonstrated some marketing potentials for commercialisation.

show abstract

Practical elimination of near-duplicates from web video search

Cited by 306 publications

References 37 publications

Large-scale near-duplicate web video search: Challenge and opportunity

Large-scale near-duplicate web video search: Challenge and opportunity

Fast Content-Based Mining of Web2.0 Videos

Challenges and techniques for effective and efficient similarity search in large video databases

Contact Info

Product

Resources

About