Searching arbitrary part of multimedia data is very difficult, since mechanical process to understand the contents is not easily realized. In this paper we will develop flexible search functions for multimedia data combined with text data and other data generated during the video is produced. Major applications are assumed to be presentation, although similar methods can be used to other cases. During the presentation the speaker uses slides, pointers and pens, such information will be used to find proper parts of video/audio data. We use a portion as a unit of video access. Since each portion of tile video is characterized by the corresponding slide data, we can use methods used for information retrieval. Furthermore, we assume that each portion of multimedia data is represented by a single or a few pictures in the portion. From the corresponding text we can generate a set of (weighted) keywords which characterize the portion. These keywords can be used to place rep-I'ermission to make digital/hard cop', of all or part of this work lbr personal or classroon~ use is granted without l~:e provided that copies are not made or distributed/br protit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is given that copying is by permission of ACM. Inc. Io copy otlacrx~ise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.~-~ 1998 ACM 0-89791-969-6/98/0002 3.50 resentative pictures in 3D space where distance corresponds to similarity. Such keywords, characteristics of video/audio, and characteristics of the representative picture are used for classificaflon. Movement of pointers and pens is also used. In this paper how to use such information for multimedia data retrieval is discussed.