How are listeners able to follow and enjoy complex pieces of music? Several theoretical frameworks suggest links between the process of listening and the formal structure of music, involving a division of the musical surface into structural units at multiple hierarchical levels. Whether boundaries between structural units are perceivable to listeners unfamiliar with the style, and are identified congruently between naïve listeners and experts, remains unclear. Here, we focused on the case of Indian music, and asked 65 Western listeners (of mixed levels of musical training; most unfamiliar with Indian music) to intuitively segment into phrases a recording of sitar ālāp of two different rāga-modes. Each recording was also segmented by two experts, who identified boundary regions at section and phrase levels. Participant- and region-wise scores were computed on the basis of "clicks" inside or outside boundary regions (hits/false alarms), inserted earlier or later within those regions (high/low "promptness"). We found substantial agreement—expressed as hit rates and click densities—among participants, and between participants' and experts' segmentations. The agreement and promptness scores differed between participants, levels, and recordings. We found no effect of musical training, but detected real-time awareness of grouping completion and boundary hierarchy. The findings may potentially be explained by underlying general bottom-up processes, implicit learning of structural relationships, cross-cultural musical similarities, or universal cognitive capacities.