In this paper, we consider the problem of people counting in video surveillance. This is an important task in video analysis, because this data can be used for predictive analytics and improvement of customer services, traffic control, etc. The proposed methods are based on object tracking and are able to work on sparse frames, which allows them to work faster and requires minimum computing resources. We use the algorithm from [1] as a baseline, which based on object tracking by head detections. Head tracking in baseline is proved to be more robust and accurate as the heads are less susceptible to occlusions. But this approach has two disadvantages: the height of people is different, which means that people’s heads are in different planes, so the raised signal line doesn’t look so clear, and also because of this, the accuracy of people counting may decrease. In baseline, this problems were solved using head-to-body linear regression, which had to be retrained for each scene, but this complicates the use of the algorithm for practical purposes. In this paper, we propose a new neural network head-to-body regressor, which allows us to solve the mentioned problems at once. Also in this paper, we use a new visual tracking algorithm that allowed us to speed up our solution. In this work, we introduce two methods — distributed modified baseline algorithm with high people counting accuracy and a solution that can run on a single processor core. Our experimental evaluation showed that the proposed modifications are consistent.