This article introduces a cognitive model and an experimental system for the representation and real-time processing of music and multimedia, based on both artificial intelligence (AI) and traditional techniques. First, it introduces some basic issues on the requirements of Al-based music systems and the class of problems faced by our system (music and multimedia composition, real-time performance, and analysis). Then, our proposal of a representation scheme -on the basis of an implemented system, called HARP -is described. It is based on the integration of different formalisms, able to manage the different nature and levels of music and multimedia objects, from the symbolic, knowledge level, dealing with multilevel, abstract representations and reasoning mechanisms, to the subsymbolic level, dealing with low-level processes, images, and physical signals. In our model, the symbolic level integrates a semantic network language of the family of KL-ONE with a temporal logic language and production rules. The subsymbolic level is modelled as a set of agents, mainly based on dynamic systems: some particular examples on the use of abstract potentials are described here. According to our bottom-up approach to AI and multimedia, we are mainly interested in the study and development of "autonomous" multimedia systems, that is, autonomous agents characterized by multimodal interaction with user(s) in unstructured and evolving environments. These particular scenarios include a theatrical automation project, where the system is delegated to manage and integrate sound, music and three-dimensional computer animation of humanoid figures interacting with real actors on stage, and a museal application, in which the system controls in real time the behaviour of a sensorbased autonomous mobile system in an exposition area, able to welcome, entertain, guide and instruct visitors.