The topic of crowd modeling in computer vision usually assumes a single generic typology of crowd, which is very simplistic. In this paper we adopt a taxonomy that is widely accepted in sociology, focusing on a particular category, the spectator crowd, which is formed by people "interested in watching something specific that they came to see" [6]. This can be found at the stadiums, amphitheaters, cinema, etc. In particular, we propose a novel dataset, the Spectators Hockey (S-HOCK), which deals with 4 hockey matches during an international tournament. In the dataset, a massive annotation has been carried out, focusing on the spectators at different levels of details: at a higher level, people have been labeled depending on the team they are supporting and the fact that they know the people close to them; going to the lower levels, standard pose information has been considered (regarding the head, the body) but also fine grained actions such as hands on hips, clapping hands etc. The labeling focused on the game field also, permitting to relate what is going on in the match with the crowd behavior. This brought to more than 100 millions of annotations, useful for standard applications as people counting and head pose estimation but also for novel tasks as spectator categorization. For all of these we provide protocols and baseline results, encouraging further research.
Abstract. We propose a new type of crowd analysis, focused on the spectator crowd, that is, people "interested in watching something specific that they came to see" [1]. This scenario applies on stadiums, amphitheaters etc., and shares some aspects with classical crowd monitoring: actually, many people are simultaneously observed, so that perperson analysis is hard; however, here the dynamics of humans is more constrained, due to the architectural environment in which they are situated; specifically, people are expected to stay in a fixed location most of the time, limiting their activities to applaud, support/heckle the players or discuss with the neighbors. In this paper, we start facing this challenge by considering hockey matches, locating a videocamera 25-30 meters far from the bleachers, pointing at the crowd: in this scenario, aggregations of spectators that exhibit similar behavior are detected, and the behavior is classified into a set of predefined classes, highlighting the overall excitement. To these aims, in a first step we focus on individual frames, clustering local flow measures into spatial regions. The clustering is then extended by adding the temporal axis into the analysis, looking for non-randomic spatio-temporal clusters; to this aim, the Lempel-Ziv complexity is considered. This way, choral activities can emerge, indicating for example fan groups belonging to different teams. After this, with the adoption of entropic measures, the degree of excitement of such groups can be quantified.
We focus on the automated analysis of spectator crowd, that is, people watching sport contests alive (in stadiums, amphitheaters etc.), or, more generally, people "watching the activities of an event [.. . ] interested in watching something specific that they came to see" [2]. This scenario differs substantially from the typical crowd analysis setting (e.g. pedestrians): here the dynamics of humans is more constrained, due to the architectural environments in which they are situated; people are expected to stay in a fixed location most of the time, limiting their activities to applaud, support/heckle the players or discuss with the neighbors. In this paper, we start facing this challenge by following a social signal processing approach, which grounds computer vision techniques in social theories. More specifically, leveraging on social theories describing expressive bodily conduct, we will show how, by using computer vision techniques, it is possible to distinguish fan groups belonging to different teams by automatically detecting their liveliness in different moments of the match, even when they are merged in the stands. Moreover, we will show how, only by automatically detecting crowd's motions on the stands, it is possible to single out the most salient events of the match, like goals, fouls or shots on goal.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.