Advances in the media and entertainment industries, for example streaming audio and digital TV, present new challenges for managing large audio-visual collections. Efficient and effective retrieval from large content collections forms an important component of the business models for content holders and this is driving a need for research in audio-visual search and retrieval. Current content management systems support retrieval using lowlevel features, such as motion, colour, texture, beat and loudness. However, low-level features often have little meaning for the human users of these systems, who much prefer to identify content using high-level semantic descriptions or concepts. This creates a gap between the system and the user that must be bridged for these systems to be used effectively. The research presented in this paper describes our approach to bridging this gap in a specific content domain, sports video. Our approach is based on a number of automatic techniques for feature detection used in combination with heuristic rules determined through manual observations of sports footage. This has led to a set of models for interesting sporting events -goal segments-that have been implemented as part of an information retrieval system. The paper also presents results comparing output of the system against manually identified goals.
KeywordsContent-based retrieval, temporal models, sports video analysis.
This paper presents an edge-based semantic classification of sports video sequences. The paper presents an algorithm for edge detection, and illustrates the usage of edges for semantic analysis of video content. We first propose an algorithm for detecting edges within video frames directly on the MPEG format without a decompression process. The algorithm is based on a spatial-domain synthetic edge model, which is defined using interrelationship of two DCT edge features: horizontal and vertical. We then use a multi-step approach to classify video sequences into meaningful semantic segments such as "goal", "foul", and "crowd in basketball games using the "edgeness" criteria. We then show how an audio feature ("whistles") can be used as a filter to enhance edge-based semantic classification.
Phone: +61 2 9325 3144, Fax: +61 2 9325 3200 {SiMa.Pfeiffer I Uma.Srinivasan } @ cmis.csiro.au
ABSTRACTThe ISO/MPEG group has identified a wide range of application scenarios [ 1 ] for their emerging MPEG-7 standard on audio-visual metadata. TV Anytime with their vision of future digital TV services [2] encompasses a large number of them. As TV Anytime has also identified metadata as one of the key requirements to realize their vision, MPEG-7 is the natural candidate to fill that role. Here, we describe technically how metadata for the TV Anytime scenario can be created using MPEG-7.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.