We describe an approach for detecting and segmenting humans with extensive posture articulations in crowded video sequences. In our method we learn a set of posture clusters, and a codebook of local shape distributions for humans in various postures. Instances of the codebook entries cast votes for locations of humans in the video and their respective postures. Subsequently, consistent hypotheses are found as maxima within a voting space. Finally, the segmentation of humans in the scene is initialized by the corresponding posture clusters and contours are evolved to obtain precise and consistent segmentations.Our experimental results indicate that the framework provides a simple yet effective means for aggregating shapebased cues. The proposed method is capable of detecting and segmenting humans in crowded scenes as they perform a diverse set of activities and undergo a wide range of articulations within different contexts.