Querying for video events by semantic signatures from few examples

Mazloom, Masoud; Habibian, Amirhossein; Snoek, Cees G. M.

doi:10.1145/2502081.2502160

Cited by 37 publications

(33 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…2, we report our performance on the MED TEST dataset consisting of 20 events. We also compare our retrieval performance against a state of the art technique published in [17]. For clarity, we only indicate the respective average precision per event obtained using our best performing method (MNE+QR on fused motion and appearance representations).…”

Section: Resultsmentioning

confidence: 99%

“…That said, some of the noted work in recent past, pertinent to recognition in unconstrained settings include machine interpretation of either low-level features [13,26] directly extracted from human labeled event videos [18,20] or training intermediate-level semantic concepts that require expensive human annotation [1,17] or a combination of both [9,21].…”

Section: Related Workmentioning

confidence: 99%

“…Recognition of complex events in consumer videos such as "parade" or "changing a tire" is propelling a new breed of research [2,3,11,12,17,18,23,28,31] in multimedia and computer vision. This is an extremely challenging problem [2,3,8] as it involves high level machine understanding of videos that are semantically diverse, and prone to frequent illumination changes, large background clutter, and significant camera motion.…”

Section: Introductionmentioning

confidence: 99%

“…We make the following technical contributions in this paper: (a) We propose a novel game based interface to study human cognition of video events (Section 3.1), which can also be used in a variety of tasks in complex event recognition that involve human supervision [3,17,23,28,31], (b) Based on our extensive study, we demonstrate that humans can recognize complex events by seeing one or two chunks of spatially downsampled shots, less than a couple seconds in length (Section 3.2), (c) We leverage on positive and negative visual cues selected by humans for efficient retrieval (Section 4), and finally, (d) We perform conclusive experiments to demonstrate significant improvement in event retrieval performance on two challenging datasets released under TRECVID (Section 5) using off-the-shelf feature extraction techniques applied on MNEs versus evidence collected sequentially.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Minimally Needed Evidence for Complex Event Recognition in Unconstrained Videos

Bhattacharya

Chang

2014

Proceedings of International Conference on Multimedia Retrieval

View full text Add to dashboard Cite

This paper addresses the fundamental question -How do humans recognize complex events in videos? Normally, humans view videos in a sequential manner. We hypothesize that humans can make high-level inference such as an event is present or not in a video, by looking at a very small number of frames not necessarily in a linear order. We attempt to verify this cognitive capability of humans and to discover the Minimally Needed Evidence (MNE) for each event.To this end, we introduce an online game based event quiz facilitating selection of minimal evidence required by humans to judge the presence or absence of a complex event in an open source video. Each video is divided into a set of temporally coherent microshots (1.5 secs in length) which are revealed only on player request. The player's task is to identify the positive and negative occurrences of the given target event with minimal number of requests to reveal evidence. Incentives are given to players for correct identification with the minimal number of requests.Our extensive human study using the game quiz validates our hypothesis -55% of videos need only one microshot for correct human judgment and events of varying complexity require different amounts of evidence for human judgment. In addition, the proposed notion of MNE enables us to select discriminative features, drastically improving speed and accuracy of a video retrieval system.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Minimally Needed Evidence for Complex Event Recognition in Unconstrained Videos

Bhattacharya

Chang

2014

Proceedings of International Conference on Multimedia Retrieval

View full text Add to dashboard Cite

show abstract

“…This is an effective way, but it is not applicable for cases in which no or few examples are available and the models cannot give interpretation or understanding of the semantics in the event. If few examples are available, the web is a powerful tool to get more examples [24,27].…”

Section: Complex Event Detectionmentioning

confidence: 99%

Knowledge based query expansion in complex multimedia event detection

Boer

Schutte

Kraaij

2015

Multimed Tools Appl

View full text Add to dashboard Cite

A common approach in content based video information retrieval is to perform automatic shot annotation with semantic labels using pre-trained classifiers. The visual vocabulary of state-of-the-art automatic annotation systems is limited to a few thousand concepts, which creates a semantic gap between the semantic labels and the natural language query. One of the methods to bridge this semantic gap is to expand the original user query using knowledge bases. Both common knowledge bases such as Wikipedia and expert knowledge bases such as a manually created ontology can be used to bridge the semantic gap. Expert knowledge bases have highest performance, but are only available in closed domains. Only in closed domains all necessary information, including structure and disambiguation, can be made available in a knowledge base. Common knowledge bases are often used in open domain, because it covers a lot of general information. In this research, query expansion using common knowledge bases ConceptNet and Wikipedia is compared to an expert description of the topic applied to content-based information retrieval of complex events. We run experiments on the Test Set of TRECVID MED 2014. Results show that 1) Query Expansion can improve performance compared to using no query expansion in the case that the main noun of the query could not be matched to a concept detector; 2) Query expansion using expert knowledge is not necessarily better than query expansion using common knowledge; 3) ConceptNet performs slightly better than Wikipedia; 4) Late fusion can slightly improve performance. To conclude, query expansion has potential in complex event detection.

show abstract

RETRACTED ARTICLE: IoT perception and public transportation network optimization based on big data algorithms

Dai

2021

Pers Ubiquit Comput

View full text Add to dashboard Cite

Querying for video events by semantic signatures from few examples

Cited by 37 publications

References 12 publications

Minimally Needed Evidence for Complex Event Recognition in Unconstrained Videos

Minimally Needed Evidence for Complex Event Recognition in Unconstrained Videos

Knowledge based query expansion in complex multimedia event detection

RETRACTED ARTICLE: IoT perception and public transportation network optimization based on big data algorithms

Contact Info

Product

Resources

About