While previous studies have documented that toddlers learn less well from passive screens than from live interaction, the rise of interactive, digital screen media opens new perspectives, since some work has shown that toddlers can learn similarly well from a human present via video chat as from live exposure. The present study aimed to disentangle the role of human presence from other aspects of social interactions on learning advantages in contingent screen settings. We assessed 16-month-old toddlers' fast mapping of novel words from screen in three conditions: in-person , video chat, and virtual agent. All conditions built on the same controlled and scripted interaction. In the in-person condition, toddlers learned two novel word-object associations from an experimenter present in the same room and reacting contingently to infants' gaze direction. In the video chat condition,tthe toddler saw the experimenter in real time on screen, while the experimenter only had access to the toddler's real-time gaze position as captured by an eyetracker. This setup allowed contingent reactivity to the toddler's gaze while controlling for any cues beyond these instructions. The virtual agent condition was programmed to follow the infant's gaze, smile, and name the object with the same parameters as the experimenter in the other conditions. After the learning phase, all toddlers were tested on their word recognition in a looking-while-listening paradigm. Comparisons against chance revealed that toddlers showed above-chance word learning in the in-person group only. Toddlers in the virtual agent group showed significantly worse performance than those in the in-person group, while performance in the video chat group overlapped with the other two groups. These results confirm that in-person interaction leads to best learning outcomes even in the absence of rich social cues They also elucidate that contingency is not sufficient either, and that in order for toddlers to learn from interactive digital media, more cues to social agency are required.