T ools and systems for content-based access to multimedia and-image, video, audio, graphics, text, and any number of combinations-has increased in the last decade. We've seen a common theme of developing automatic analysis techniques for deriving metadata (data describing information in the content at both syntactic and semantic levels). Such metadata facilitates developing innovative tools and systems for multimedia information retrieval, summarization, delivery, and manipulation. Many interesting demonstrations of potential applications and services have emerged-finding images visually similar to a chosen picture (or sketch); summarizing videos with thumbnails of keyframes; finding video clips of a specific event, story, or person; and producing a two-minute skim of an hour-long program. (Audio-visual skims are condensed media clips that summarize information in the content.)There's much excitement and buzz created by these fancy applications. But people are always asking, What will be engineering's Holy Grail for content-based media analysis in practical applications? My response is that the answer is complex. It's less about a specific algorithm or service than a rigorous methodology to formulate and evaluate content-based analysis research.
Content chainTo evaluate content-based research methodologies, we must first consider who the intended users are and whether alternative solutions exist. Hence, it's important to consider each solution within the context of the content chain, the process starting with acquisition, followed by production, processing, and finally consumption. chain and is produced by specific production sources and methods, associated with multiple media modalities (audio, images, video, graphics, text, and so forth), and intended for specific user groups. It's important to remember that most often users don't deal with raw content just coming off the sensing devices-many content processing stages have already occurred. Figure 1 also highlights important principles critical for effective content analysis:❚ understanding production models, ❚ fusing multimedia data, and ❚ exploring perceptual viewer models.
Areas of researchBy content-based media analysis, I refer to research in the following areas:❚ Reverse engineering of the media capturing and editing processes. Works in this area attempt to reverse engineer the capturing and editing processes, and recover the constituent content components such as shots, scenes, and structural elements (dialogues, anchors, and so forth) in the video. By breaking videos into atomic entities at different levels, we can develop intuitive and efficient access tools.❚ Extracting and matching objects. Similar to breaking documents into words or phrases, we can decompose images and videos into objects, from which we can derive comprehensive attributes. Much work has been pursued to define adequate features and criteria for matching audio-visual content based on audio-visual properties and spatiotemporal relationships.
❚ Meaning decoding and automatic annotation...