Speech signals have an unique shape of long-term modulation spectrum that is distinct from environmental noise, music, and non-speech vocalizations. Does the human auditory system adapt to the speech long-term modulation spectrum and efficiently extract critical information from speech signals? To answer this question, we tested whether neural responses to speech signals can be captured by specific modulation spectra of non-speech acoustic stimuli. We generated amplitude modulated (AM) noise with the speech modulation spectrum and 1/f modulation spectra of different exponents to imitate temporal dynamics of different natural sounds. We presented these AM stimuli and a 10-minute piece of natural speech to 19 human participants undergoing electroencephalography (EEG) recording. We derived temporal response functions (TRF) to the AM stimuli of different spectrum shapes and found distinct neural dynamics for each type of TRFs. We then used the TRFs of AM stimuli to predict neural responses to the speech signals, and found that 1) the TRFs of AM modulation spectra of exponents 1, 1.5 and 2 preferably captured EEG responses to speech signals in the delta band and 2) the theta neural band of speech neural responses can be captured by the AM stimuli of an exponent of 0.75. Our results suggest that the human auditory system shows specificity to the long-term modulation spectrum and is equipped with characteristic neural algorithms tailored to extract critical acoustic information from speech signals. 3 Significant Statement Speech signals have an unique long-term modulation spectrum shape that differs speech from other natural sounds. Does the human auditory system adapt to the speech long-term modulation spectrum and efficiently extract critical information from speech signals? To answer this question, we generated aritificial sounds with various modulation spectra and examined whether neural encoding models derived from specific modulation spectra can better explain neural responses to speech signals than others. We found that the modulation spectra with the exponents that are close to the speech modulation spectrum preferably captured EEG responses to speech signals than others. Our results suggest that the human auditory system shows high sensititity to the long-term modulation spectrum specific to speech signals.