This article explores the rapidly advancing innovation to endow robots with social intelligence capabilities in the form of multilingual and multimodal emotion recognition, and emotion-aware decision-making capabilities, for contextually appropriate robot behaviours and cooperative social human–robot interaction for the healthcare domain. The objective is to enable robots to become trustworthy and versatile social robots capable of having human-friendly and human assistive interactions, utilised to better assist human users’ needs by enabling the robot to sense, adapt, and respond appropriately to their requirements while taking into consideration their wider affective, motivational states, and behaviour. We propose an innovative approach to the difficult research challenge of endowing robots with social intelligence capabilities for human assistive interactions, going beyond the conventional robotic sense-think-act loop. We propose an architecture that addresses a wide range of social cooperation skills and features required for real human–robot social interaction, which includes language and vision analysis, dynamic emotional analysis (long-term affect and mood), semantic mapping to improve the robot’s knowledge of the local context, situational knowledge representation, and emotion-aware decision-making. Fundamental to this architecture is a normative ethical and social framework adapted to the specific challenges of robots engaging with caregivers and care-receivers.