INTRODUCTIONWhile the technology for mining text documents in large databases could be said to be relatively mature, the same cannot be said for mining other important data types such as speech, music, images and video. Yet these forms of multimedia data are becoming increasingly prevalent on the internet and intranets as bandwidth rapidly increases due to continuing advances in computing hardware and consumer demand. An emerging major problem is the lack of accurate and efficient tools to query these multimedia data directly, so we are usually forced to rely on available metadata such as manual labeling. Currently the most effective way to label data to allow for searching of multimedia archives is for humans to physically review the material. This is already uneconomic or, in an increasing number of application areas, quite impossible because these data are being collected much faster than any group of humans could meaningfully label them -and the pace is accelerating, forming a veritable explosion of non-text data. Some driver applications are emerging from heightened security demands in the 21 st century, postproduction of digital interactive television, and the recent deployment of a planetary sensor network overlaid on the internet backbone.
BACKGROUNDAlthough they say a picture is worth a thousand words, computer scientists know that the ratio of information contained in images compared to text documents is often much greater than this. Providing text labels for image data is problematic because appropriate labeling is very dependent on the typical queries users will wish to perform, and the queries are difficult to anticipate at the time of labeling.For example, a simple image of a red ball would be best labeled as sports equipment, a toy, a red object, a round object, or even a sphere, depending on the A valuable prototype application that could be deployed on IrisNet is wide area person recognition and location services. Such services have existed since the emergence of human society to locate specific persons when they are not in immediate view. For example, in a crowded shopping mall, a mother may ask her child, "Have you seen your sister?" If there were a positive response, this may then be followed by a request to know the time and place of the last sighting, or perhaps by a request to go look for her. Here the mother is using the eyes, face recognition ability, memory persistence, and mobility of the child to perform the search. If the search fails, the mother may then ask the mall manager to give a "lost child"announcement over the public address system. Eventually the police may be asked to employ these human search services on a much wider scale by showing a photograph of the missing child on the television to ask the wider community for assistance in the search.On the IrisNet the mother could simply upload a photograph of her child from the image store in her mobile phone and the system would efficiently look for the child in an ever-widening geographic search space until contact was made. Cle...