What triggered the emergence of uniquely human behaviors (language, religion, music) some 100,000 years ago? A non-circular, speculative theory based on the mother-infant relationship is presented. Infant “cuteness” evokes the infant schema and motivates nurturing; the analogous mother schema (MS) is a multimodal representation of the carer from the fetal/infant perspective, motivating fearless trust. Prenatal MS organizes auditory, proprioceptive, and biochemical stimuli (voice, heartbeat, footsteps, digestion, body movements, biochemicals) that depend on maternal physical/emotional state. In human evolution, bipedalism and encephalization led to earlier births and more fragile infants. Cognitively more advanced infants survived by better communicating with and motivating (manipulating) mothers and carers. The ability to link arbitrary sound patterns to complex meanings improved (proto-language). Later in life, MS and associated emotions were triggered in ritual settings by repetitive sounds and movements (early song, chant, rhythm, dance), subdued light, dull auditory timbre, psychoactive substances, unusual tastes/smells and postures, and/or a feeling of enclosure. Operant conditioning can explain why such actions were repeated. Reflective consciousness emerged as infant-mother dyads playfully explored intentionality (theory of mind, agent detection) and carers predicted and prevented fatal infant accidents (mental time travel). The theory is consistent with cross-cultural commonalities in altered states (out-of-body, possessing, floating, fusing), spiritual beings (large, moving, powerful, emotional, wise, loving), and reports of strong musical experiences and divine encounters. Evidence is circumstantial and cumulative; falsification is problematic.