Infants learn their native language(s) at an amazing speed. Before they even talk, their perception adapts to the language(s) they hear. However, the mechanisms responsible for this perceptual attunement still remain unclear and are at the heart of heated debates in psychology, linguistics, philosophy and neuroscience. The dominant explanation for this perceptual attunement posits that infants apply a domain-general learning mechanism consisting in learning statistical regularities from the speech stream they hear. Such a general learning mechanism has been proposed to account for perceptual attunement effects both in auditory and visual learning, and in both primates and non-primates. Other theories taking into account this perceptual attunement claim that infants are born with an innate specialized language learning device that would allow us to quickly and effortlessly learn from the language(s) we are exposed to. Critically, the feasibility of the purely domain-general statistical learning mechanism has only been demonstrated with computational models on unrealistic and simplified input. Here we propose to simulate early language acquisition from 2000 hours of ecological child-centered audio data in American English and Metropolitan French. We show that when applied on ecologically-valid data, generic learning mechanisms do develop a language-relevant perceptual space but fail to show evidence for perceptual attunement. It is only when supplemented with domain-specific audio filtering and augmentation mechanisms that computational models show a significant attunement to the language they have been exposed to. Hence, we conclude that, when learning from ecological audio, domain-specific mechanisms may be necessary to guide early language learning in the wild even if the learning itself is done through generic mechanisms. We anticipate our work to be a starting point for ecologically-valid computational models of perceptual attunement.