Speech and music signals show rhythmicity in their temporal structure with slower rhythmic rates in music than in speech. Speech processing has been related to brain rhythms in the auditory and motor cortex at around 4.5 Hz, while music processing has been associated to motor cortex activity at around 2 Hz reflecting the temporal structures in speech and music. In addition, slow motor cortex brain rhythms were suggested to be central for timing in both domains. It thus remains unclear if domain-general or frequency specific mechanisms are driving speech and music processing. Additionally, for speech processing, auditory-motor cortex coupling and perception-production synchronization at 4.5 Hz have been related to enhanced auditory perception in various tasks. However, it is unknown whether this effect generalizes to synchronization and perception in music at distinct optimal rates. Using a behavioral protocol, we investigate whether (1) perception-production synchronization shows distinct optimal rates for speech and music; (2) optimal rates in perception are predicted by synchronization strength at different time scales. A perception task involving speech and music stimuli and a synchronization task using tapping and whispering were conducted at slow (~2 Hz) and fast rates (~4.5 Hz). Results revealed that synchronization was generally better at slow rates. Importantly, for slow but not for fast rates, tapping showed superior performance when compared to whispering, suggesting domain-specific rate preferences. Accordingly, synchronization performance was highly correlated across domains only at fast but not at slow rates. Altogether, perception of speech and music were optimal at different timescales, and predicted by auditory-motor synchronization strength. Our data suggests different optimal time scales for music and speech processing with partially overlapping mechanisms.