Reading the Videos: Temporal Labeling for Crowdsourced Time-Sync Videos Based on Semantic Embedding

Lv, Guangyi; Xu, Tong; Chen, Enhong; Liu, Qi; Zheng, Yi

doi:10.1609/aaai.v30i1.10383

Cited by 31 publications

(6 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Dieng et al [30] have presented a topic-based recurrent neural network (RNN) for sentiment analysis. Lv et al [98] have used LDA and deep learning to describe videos using language. Recently, Dong et al [32] have used LDA-based topic discovery and learning to produce interpretable deep learning for video description.…”

Section: Multi-scalementioning

confidence: 99%

“…The bag-of-words is constructed using low-level features such as pixels [43] or motion tracks [152]. The problem of representing similar concepts using similar bag-of-words is solved using the contextual relevance representation [59,98]. The method uses language embedded with visual words for finding the similar concepts and applied in video analysis.…”

Section: Topic Representation and Feature Embeddingmentioning

confidence: 99%

See 1 more Smart Citation

Topic-based Video Analysis

et al. 2021

View full text Add to dashboard Cite

Manual processing of a large volume of video data captured through closed-circuit television is challenging due to various reasons. First, manual analysis is highly time-consuming. Moreover, as surveillance videos are recorded in dynamic conditions such as in the presence of camera motion, varying illumination, or occlusion, conventional supervised learning may not work always. Thus, computer vision-based automatic surveillance scene analysis is carried out in unsupervised ways. Topic modelling is one of the emerging fields used in unsupervised information processing. Topic modelling is used in text analysis, computer vision applications, and other areas involving spatio-temporal data. In this article, we discuss the scope, variations, and applications of topic modelling, particularly focusing on surveillance video analysis. We have provided a methodological survey on existing topic models, their features, underlying representations, characterization, and applications in visual surveillance’s perspective. Important research papers related to topic modelling in visual surveillance have been summarized and critically analyzed in this article.

show abstract

Section: Multi-scalementioning

confidence: 99%

Section: Topic Representation and Feature Embeddingmentioning

confidence: 99%

Topic-based Video Analysis

et al. 2021

View full text Add to dashboard Cite

show abstract

“…Intuitively, each embedding matrix can compress the vocabulary into a low-dimensional space (Li et al 2015;Lv et al 2016). However, because the multiple embedding spaces are constructed based on different contexts in the text corpus, the embedding spaces are different from each other.…”

Section: The Context-enriched Neural Networkmentioning

confidence: 99%

A Context-Enriched Neural Network Method for Recognizing Lexical Entailment

Zhang

Chen

Liu

et al. 2017

AAAI

Self Cite

View full text Add to dashboard Cite

Recognizing lexical entailment (RLE) always plays an important role in inference of natural language, i.e., identifying whether one word entails another, for example, fox entails animal. In the literature, automatically recognizing lexical entailment for word pairs deeply relies on words' contextual representations. However, as a "prototype" vector, a single representation cannot reveal multifaceted aspects of the words due to their homonymy and polysemy. In this paper, we propose a supervised Context-Enriched Neural Network (CENN) method for recognizing lexical entailment. To be specific, we first utilize multiple embedding vectors from different contexts to represent the input word pairs. Then, through different combination methods and attention mechanism, we integrate different embedding vectors and optimize their weights to predict whether there are entailment relations in word pairs. Moreover, our proposed framework is flexible and open to handle different word contexts and entailment perspectives in the text corpus. Extensive experiments on five datasets show that our approach significantly improves the performance of automatic RLE in comparison with several state-of-the-art methods.

show abstract

“…Efforts have also been devoted to associate comments with video content along the timeline. In (Lv et al 2016), time-sync comments are first represented with semantic vectors, then a video splitting framework is designed to extract and label meaningful segments based on mapping the semantic vectors to pre-defined labels in a supervised way. However, this model relies on a large amount of human-labeled video segments and predefined emotional tags to train, which limits its applicability to more general scenarios.…”

Section: Analysis Of Time-sync Video Commentsmentioning

confidence: 99%

“…Recently, methods have been proposed to generate temporal tags or labels based on crowdsourced time-sync video comments Lv et al 2016), which mainly focus on extracting keywords such as topics or semantic labels. On the other hand, keywords sometimes are not sufficient to describe a scene, especially when the scene includes a number of characters or depicts a complicated situation.…”

Section: Introductionmentioning

confidence: 99%

Bridging Video Content and Comments: Synchronized Video Description with Temporal Summarization of Crowdsourced Time-Sync Comments

Zhang

2017

AAAI

View full text Add to dashboard Cite

With the rapid growth of online sharing media, we are facing a huge collection of videos. In the meantime, due to the volume and complexity of video data, it can be tedious and time consuming to index or annotate videos. In this paper, we propose to generate temporal descriptions of videos by exploiting the information of crowdsourced time-sync comments which are receiving increasing popularity on many video sharing websites. In this framework, representative and interesting comments of a video are selected and highlighted along the timeline, which provide an informative description of the video in a time-sync manner. The challenge of the proposed application comes from the extremely informal and noisy nature of the comments, which are usually short sentences and on very different topics. To resolve these issues, we propose a novel temporal summarization model based on the data reconstruction principle, where representative comments are selected in order to best reconstruct the original corpus at the text level as well as the topic level while incorporating the temporal correlations of the comments. Experimental results on real-world data demonstrate the effectiveness of the proposed framework and justify the idea of exploiting crowdsourced time-sync comments as a bridge to describe videos.

show abstract

Reading the Videos: Temporal Labeling for Crowdsourced Time-Sync Videos Based on Semantic Embedding

Cited by 31 publications

References 16 publications

Topic-based Video Analysis

Topic-based Video Analysis

A Context-Enriched Neural Network Method for Recognizing Lexical Entailment

Bridging Video Content and Comments: Synchronized Video Description with Temporal Summarization of Crowdsourced Time-Sync Comments

Contact Info

Product

Resources

About