One key challenge to create believable embodied conversational agents (ECA) is to produce engaging behavior-and feedbacks (short verbal, vocal and gestural reactions produced when hearing the main speaker) play an important role. In this paper we propose a machine learning-based model for multimodal feedbacks. The goal is to learn, from a corpus of human-human interactions, when a virtual agent should display a feedback along with its type. And to be feasible, an important aspect is to be able to process them in real time, using reliable features. For this purpose, we used random forests with different features, using annotated corpora of taskoriented interactions. Our case study is the context of training doctors to break bad news to a patient (played by an actor or by the ECA). The performance of the method highlights the capacity to predict verbal and non-verbal feedbacks based on a small number of features characterizing temporal information, in particular, the silence and the position of the last feedback. CCS CONCEPTS • Computing methodologies → Artificial intelligence; Intelligent agents; Feature selection.