Semantic queries involving image understanding aspects require the exploitation of multiple clues, namely the (inter-)relations between objects and events across multiple images, the situational context, and the application context. A prominent example for such queries is the identification of individuals in video sequences. Straightforward face recognition approaches require a model of the persons in question and tend to fail in ill conditioned environments. Therefore, an alternative approach is to involve contextual conditions of observations in order to determine the role a person plays in the current context. Due to the strong relation between roles, persons and their identities, knowing either often allows inferring about the other. This paper presents a system that implements this approach: First, robust face detection localizes the actors in the video. By clustering similar face instances the relative frequency of their appearance within a sequence is determined. In combination with a coarse textual annotation manually created by the broadcast station's archivist the roles and consequently the identities can be assigned and labeled in the video. Starting with unambiguous assignments and cascading appropriately most of the persons can be identified and labeled successfully. The feasibility and performance of the role-based person identification is demonstrated on basis of several programs of a popular German TV show, which consists of various elements like interview scenes, games and musical show acts
As books have chapters, sections, paragraphs, sentences, etc., videos have an inherent hierarchical structure as well. Chapters, scenes, shots and sub-shots are the temporal units in videos. Because manual structure extraction is time-consuming, automatic segmentation has been a research effort in the past 10 to 15 years and is a prerequisite for efficient video indexing, annotation, search and retrieval. This paper focuses on our recent research in the fields of scene, shot and sub-shot extraction and their combination into a video structure detection system. The first step is the detection of shot transitions with separate detectors for hard cuts, fades, dissolves and wipes. Then complex shots are further segmented into semantically meaningful units called sub-shots. Finally the results are employed to extract scenes. We propose to use film-grammar based on shot transition types to improve the results of scene detection. The algorithms proposed are robust to distortions and artefacts found in digitized archived video
Logical units are semantic video segments above the shot level. Depending on the common semantics within the unit and data domain, different types of logical unit extraction algorithms have been presented in literature. Topic units are typically extracted for documentaries or news broadcasts while scenes are extracted for narrative-driven video such as feature films, sitcoms, or cartoons. Other types of logical units are extracted from home video and sports. Different algorithms in literature used for the extraction of logical units are reviewed in this paper based on the categories unit type, data domain, features used, segmentation method, and thresholds applied. A detailed comparative study is presented for the case of extracting scenes from narrative-driven video. While earlier comparative studies focused on scene segmentation methods only or on complete news-story segmentation algorithms, in this paper various visual features and segmentation methods with their thresholding mechanisms and their combination into complete scene detection algorithms are investigated. The performance of the resulting large set of algorithms is then evaluated on a set of video files including feature films, sitcoms, children's shows, a detective story, and cartoons.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.