In the last decades we have assisted to a growing need for security in many public environments. This consideration has lead the proliferation of cameras and microphones, which represent a suitable solution for their relative low cost of maintenance, the possibility of installing them virtually everywhere and, finally, the capability of analyzing more complex events. However, the main limitation of traditional audio-video surveillance systems lies in the so called psychological overcharge issue of the human operators responsible for security, that causes a decrease in their capabilities to analyze raw data flows from multiple sources of multimedia information. For the above mentioned reasons, it would be really useful to design an intelligent surveillance system, able to provide images and video with a semantic interpretation, for trying to bridge the gap between their low-level representation in terms of pixels, and the high-level, natural language description that a human would give about them. The aim of this thesis [11] is to face the above mentioned issues, as fascinating as challenging. The proposed system analyzes videos and by extracting trajectories of objects populating the scene (tracking): trajectory is a very discriminant feature, since the movement of objects in a scene is not random, but instead have an underlying structure which can be exploited to build some models. The main novelties of the proposed tracking algorithm lie in the following aspects: first, the entire history of each object populating the scene is analyzed by means of a Finite State Automaton; second, the update of information related to each object is performed by a graph-based approach. Finally, occlusions are properly managed by tracking into a different way single objects and groups of objects [10] [7][8]. The proposed tracking algorithm has been evaluated during an international competition (PETS 2013), ranking in the first places for all the considered scores over an high number of participants (more than thirty). Once extracted, this large amount of trajectories needs to be indexed and properly stored in order to improve the overall performance of the system during the retrieving step [9]: the main novelty of this module pertains the enhancement of off-the-shelf solutions, namely PostGis (the spatial extension of the traditional PostGres database) in order to deal with trajectories, which are very complex elements to manage because of their spatio-temporal nature. In general, the main advantage of the proposed approach lies in the fact that a human operator can interact with the system in different ways: first of all, he is informed by the system as soon as an abnormal behavior occurs. It is evident that the system has to be robust enough to deal with errors typically occurring during the tracking phase, related for instance to broken Correspondence to: Recommended for acceptance by ELCVIA