Spontaneous real-life speech is imperfect in many ways. It contains disfluencies and ill-formed utterances that the brain needs to contend with in order to extract meaning out of speech. Here, we studied how the neural speech-tracking response is affected by three specific factors that are prevalent in spontaneous colloquial speech: (1) the presence of non-lexical fillers, (2) the need to detect syntactic boundaries in disfluent speech and (3) the effort involved in processing syntactically complex phrases. Neural activity (EEG) was recorded from individuals as they listened to an unscripted, spontaneous narrative, which was analyzed in a time-resolved fashion to identify fillers, detect syntactic boundaries and assess the syntactic complexity of different phrases. When considering these factors in the speech-tracking analysis, we found that it was affected by all of them. The most consistent effect, observed for all three factors, was modulation of a centro-frontal negative response that peaked around 350 ms, highly resembling the well-known N400 ERP response linked to various aspects of lexical access and semantic processing. This response was observed for lexical words but not for fillers, was larger for opening vs. closing words of a clause and was enhanced in response to high-complexity phrases. These findings broaden ongoing efforts to understand neural processing of speech under increasingly realistic conditions. They highlight the importance of considering the imperfect nature of real-life spoken language, linking past research on linguistically well-formed and meticulously controlled speech to the type of speech that the brain actually deals with on a daily basis.