Abstract. Continuous audio-visual surveillance is utilized to ensure the physical safety of critical infrastructures such as airports, nuclear power plants and national laboratories. In order to do so, traditional surveillance systems place cameras, microphones and other sensory input devices in appropriate locations [Sch99]. These facilities are arranged in a hierarchy of physical zones reflecting the secrecy of the guarded information. Guards in these facilities carry clearances that permit them only in appropriate zones of the hierarchy, and monitor the facilities by using devices such as hand-held displays that send streaming media of the guarded zones possibly with some instructions. The main security constraint applicable to this model is that any guard can see streams emanating from locations with secrecy levels equal to or lower than theirs, but not higher. We show how to model these surveillance requirements using the synchronized multimedia integration language (SMIL) [Aya01] with appropriate security enhancements. Our solution consists of imposing a multi-level security model on SMIL documents to specify surveillance requirements. Our access control model ensures that a multimedia stream can only be displayed on a device if the security clearance of the display device dominates the security clearance of the monitored zone. Additionally, we pre-process a set of cover stories that can be released during emergency situations that allow using the services of guards with lower clearances without disclosing data with higher sensitive levels. For this, we create a view for each level, and show that these views are semantically coherent and comply with specified security polices.