We propose an object detection system that uses the locations of tracked low-level feature points as input, and produces a set of independent coherent motion regions as output. As an object moves, tracked feature points on it span a coherent 3D region in the space-time volume defined by the video. In the case of multi-object motion, many possible coherent motion regions can be constructed around the set of all feature point tracks. Our approach is to identify all possible coherent motion regions, and extract the subset that maximizes an overall likelihood function while assigning each point track to at most one motion region. We solve the problem of finding the best set of coherent motion regions with a simple greedy algorithm, and show that our approach produces semantically correct detections and counts of similar objects moving through crowded scenes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.