Speech comprehension is severely compromised when several people talk at once, due to limited perceptual and cognitive resources. Under some circumstances listeners can employ top-down attention to prioritize the processing of task-relevant speech. However, whether the system can effectively represent more than one speech input remains highly debated. Here we studied how task-relevance affects the neural representation of concurrent speakers under two extreme conditions: when only one speaker was task-relevant (Selective Attention), vs. when two speakers were equally relevant (Distributed Attention). Neural activity was measured using magnetoencephalography (MEG) and we analysed the speech-tracking responses to both speakers. Crucially, we explored different hypotheses as to how the brain may have represented the two speech streams, without making a-priori assumptions regarding participants' internal allocation of attention. Results indicate that neural tracking of concurrent speech did not fully mirror their instructed task-relevance. When Distributed Attention was required, we observed a tradeoff between the two speakers despite their equal task-relevance, akin to the top-down modulation observed during Selective Attention. This points to the system's inherent limitation to fully process two speech streams, and highlights the complex nature of attention, particularly for continuous speech.