Humanoid robots are already among us and they are beginning to assume more social and personal roles, like guiding and assisting people. Thus, they should interact in a human-friendly manner, using not only verbal cues but also synchronized non-verbal and para-verbal cues. However, available robots are not able to communicate in this multimodal way, being just able to perform predefined gesture sequences, handcrafted to accompany specific utterances. In the current paper, we propose a model based on three different approaches to extend humanoid robots communication behaviour with upper body gestures synchronized with the speech for novel utterances, exploiting part-of-speech grammatical information, prosody cues, and a combination of both. User studies confirm that our methods are able to produce natural, appropriate and good timed gesture sequences synchronized with speech, using both beat and emblematic gestures.