2010
DOI: 10.1007/978-3-642-12900-1_14
|View full text |Cite
|
Sign up to set email alerts
|

YouTube Scale, Large Vocabulary Video Annotation

Abstract: As video content on the web continues to expand, it is increasingly important to properly annotate videos for effective search and mining. While the idea of annotating static imagery with keywords is relatively well known, the idea of annotating videos with natural language keywords to enhance search is an important emerging problem with great potential to improve the quality of video search. However, leveraging web-scale video datasets for automated annotation also presents new challenges and requires methods… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
11
0

Year Published

2010
2010
2017
2017

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 13 publications
(11 citation statements)
references
References 42 publications
0
11
0
Order By: Relevance
“…every minute, 100 hours of video are uploaded to YouTube. 1 However, if a video is poorly tagged, its utility is dramatically diminished [24]. Automatic video description generation has the potential to help improve indexing and search quality for online videos.…”
Section: Introductionmentioning
confidence: 99%
“…every minute, 100 hours of video are uploaded to YouTube. 1 However, if a video is poorly tagged, its utility is dramatically diminished [24]. Automatic video description generation has the potential to help improve indexing and search quality for online videos.…”
Section: Introductionmentioning
confidence: 99%
“…First, LSH is more suitable for indexing global image descriptors (e.g., CNN) while vocabulary tree is built to index local image descriptors (e.g., SIFT). Second, LSH shows better search performance because of using the hashing technique while the vocabulary tree uses the recursive clustering to partition the space resulting in worse performance and higher inaccuracy especially when the tree becomes deeper [22].…”
Section: Spatial-visual Searchmentioning
confidence: 99%
“…Brezeale and Cook [17] surveyed text, video, and audio features for classifying videos into a predefined set of genres, e.g., "sports" or "comedy". Morsillo et al [94] presented a brief review that focused on efficient and scalable methods for annotating Web videos at various levels including objects, scenes, actions, and high-level events. Lavee et al [67] reviewed event modeling methods, mostly in the context of simple human activity analysis.…”
Section: Related Reviewsmentioning
confidence: 99%