Abstract-The purpose of our research is to develop a humanoid museum guide robot that performs intuitive, multimodal interaction with multiple persons. In this paper, we present a robotic system that makes use of visual perception, sound source localization, and speech recognition to detect, track, and involve multiple persons into interaction. Depending on the audio-visual input, our robot shifts its attention between different persons. In order to direct the attention of its communication partners towards exhibits, our robot performs gestures with its eyes and arms. As we demonstrate in practical experiments, our robot is able to interact with multiple persons in a multimodal way and to shift its attention between different people. Furthermore, we discuss experiences made during a two-day public demonstration of our robot.