We present a robot that is working with humans on a common construction task. In this kind of interaction, it is important that the robot can perform different roles in order to realise an efficient collaboration. For this, we introduce embodied multimodal fusion, a new approach for processing data from the robot's input modalities. Using this method, we implemented two different robot roles: the robot can take the instructive role, in which the robot mainly instructs the user how to proceed with the construction; or the robot can take the supportive role, in which the robot hands over assembly pieces to the human that fit to the current progress of the assembly plan. We present a user evaluation that researches how humans react to the different roles of the robot. The main findings of this evaluation are that the users do not prefer one of the two roles of the robot, but take the counterpart to the robot's role and adjust their own behaviour according to the robot's actions. The most influential factors for user satisfaction in this kind of interaction are the number of times the users picked up a building piece without getting an explicit instruction by the robot, which had a positive influence, and the number of utterances the users made themselves, which had a negative influence.