Many species communicate by combining signals into multimodal combinations. Elephants live in multi-level societies where individuals regularly separate and reunite. Upon reunion, elephants often engage in elaborate greeting rituals, where they use vocalisations and body acts produced with different body parts and of various sensory modalities (e.g., audible, tactile). However, whether these body acts represent communicative gestures and whether elephants combine vocalisations and gestures during greeting is still unknown. Here we use separation-reunion events to explore the greeting behaviour of semi-captive elephants (Loxodonta africana). We investigate whether elephants use silent-visual, audible, and tactile gestures directing them at their audience based on their state of visual attention and how they combine these gestures with vocalisations during greeting. We show that elephants select gesture modality appropriately according to their audience’s visual attention, suggesting evidence of first-order intentional communicative use. We further show that elephants integrate vocalisations and gestures into different combinations and orders. The most frequent combination consists of rumble vocalisations with ear-flapping gestures, used most often between females. By showing that a species evolutionarily distant to our own primate lineage shows sensitivity to their audience’s visual attention in their gesturing and combines gestures with vocalisations, our study advances our understanding of the emergence of first-order intentionality and multimodal communication across taxa.