2016
DOI: 10.1609/aaai.v30i1.10383
|View full text |Cite
|
Sign up to set email alerts
|

Reading the Videos: Temporal Labeling for Crowdsourced Time-Sync Videos Based on Semantic Embedding

Abstract: Recent years have witnessed the boom of online sharing media contents, which raise significant challenges in effective management and retrieval. Though a large amount of efforts have been made, precise retrieval on video shots with certain topics has been largely ignored. At the same time, due to the popularity of novel time-sync comments, or so-called "bullet-screen comments", video semantics could be now combined with timestamps to support further research on temporal video labeling. In this paper, we propos… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
6
0

Year Published

2017
2017
2023
2023

Publication Types

Select...
4
4

Relationship

1
7

Authors

Journals

citations
Cited by 31 publications
(6 citation statements)
references
References 16 publications
0
6
0
Order By: Relevance
“…Dieng et al [30] have presented a topic-based recurrent neural network (RNN) for sentiment analysis. Lv et al [98] have used LDA and deep learning to describe videos using language. Recently, Dong et al [32] have used LDA-based topic discovery and learning to produce interpretable deep learning for video description.…”
Section: Multi-scalementioning
confidence: 99%
See 1 more Smart Citation
“…Dieng et al [30] have presented a topic-based recurrent neural network (RNN) for sentiment analysis. Lv et al [98] have used LDA and deep learning to describe videos using language. Recently, Dong et al [32] have used LDA-based topic discovery and learning to produce interpretable deep learning for video description.…”
Section: Multi-scalementioning
confidence: 99%
“…The bag-of-words is constructed using low-level features such as pixels [43] or motion tracks [152]. The problem of representing similar concepts using similar bag-of-words is solved using the contextual relevance representation [59,98]. The method uses language embedded with visual words for finding the similar concepts and applied in video analysis.…”
Section: Topic Representation and Feature Embeddingmentioning
confidence: 99%
“…Intuitively, each embedding matrix can compress the vocabulary into a low-dimensional space (Li et al 2015;Lv et al 2016). However, because the multiple embedding spaces are constructed based on different contexts in the text corpus, the embedding spaces are different from each other.…”
Section: The Context-enriched Neural Networkmentioning
confidence: 99%
“…Efforts have also been devoted to associate comments with video content along the timeline. In (Lv et al 2016), time-sync comments are first represented with semantic vectors, then a video splitting framework is designed to extract and label meaningful segments based on mapping the semantic vectors to pre-defined labels in a supervised way. However, this model relies on a large amount of human-labeled video segments and predefined emotional tags to train, which limits its applicability to more general scenarios.…”
Section: Analysis Of Time-sync Video Commentsmentioning
confidence: 99%
“…Recently, methods have been proposed to generate temporal tags or labels based on crowdsourced time-sync video comments Lv et al 2016), which mainly focus on extracting keywords such as topics or semantic labels. On the other hand, keywords sometimes are not sufficient to describe a scene, especially when the scene includes a number of characters or depicts a complicated situation.…”
Section: Introductionmentioning
confidence: 99%