The human ability to empathise is a core aspect of successful interpersonal relationships. In this regard, humanrobot interaction can be improved through the automatic perception of empathy, among other human attributes, allowing robots to affectively adapt their actions to interactants' feelings in any given situation. This paper presents our contribution to the generalised track of the One-Minute Gradual (OMG) Empathy Prediction Challenge by describing our approach to predict a listener's valence during semi-scripted actor-listener interactions. We extract visual and acoustic features from the interactions and feed them into a bidirectional long shortterm memory network to capture the time-dependencies of the valence-based empathy during the interactions. Generalised and personalised unimodal and multimodal valence-based empathy models are then trained to assess the impact of each modality on the system performance. Furthermore, we analyse if intrasubject dependencies on empathy perception affect the system performance. We assess the models by computing the concordance correlation coefficient (CCC) between the predicted and self-annotated valence scores. The results support the suitability of employing multimodal data to recognise participants' valence-based empathy during the interactions, and highlight the subject-dependency of empathy. In particular, we obtained our best result with a personalised multimodal model, which achieved a CCC of 0.11 on the test set.