With the continuous evolution of telecommunication and computing technologies, more and more repositories of digital video data are being developed to support a wide range of applications in digital libraries, telemedicine, distance learning, tourism, entertainment, etc. With the rapid proliferation of the Web, these applications are rapidly emerging. Content‐based retrieval of video data has been the subject of extensive research since 1990. Because of the huge volume of data, it becomes crucial to develop indexing techniques that will carry out the process of content‐based retrieval more efficiently.
The problem of video indexing is to create and maintain index structures and algorithms that support the efficient execution of queries about the contents of video presentations. Such queries may ask about features of objects or regions contained within a video, or relationships between objects or regions contained within a video. Additionally, queries may concern these features or relationships in relation to time. The spatial and temporal aspects of video indexing taken separately are nontrivial problems. Each type of indexing has been studied widely, and many research problems remain. Indexing of video, however, must be differentiated from spatial, temporal, and spatiotemporal indexing techniques in that information to be indexed may include not only spatiotemporal information, but possibly highly dimensional feature data such as texture, textual closed captioning information, shape, color histograms, and object trajectories or animation operations. A video indexing technique must, therefore, support efficient searches for objects and images on the basis of the three major facets of a video: its spatial, temporal, and feature values.
A simple multimedia object model is presented to provide a common reference point for discussing multimedia indexing in the remainder of this chapter. The model is contained in the following definitions discussed.