Abstract:This paper proposes performance measures to evaluate object tracking algorithms using object labels and sizes.. The usefulness and effectiveness of the proposed evaluation measures are shown by reporting the performance evaluation of two tracking algorithms. We then compare the application of the proposed evaluation measures with related work to demonstrate that they are more suitable.
“…Most papers in their experimental evaluation use a limited number of videos. For example, only six videos are being used in [24]. And, the often used BoBoT dataset, tested in [25], consists of ten different video sequences.…”
Section: Data For Tracker Evaluationmentioning
confidence: 99%
“…The Performance Evaluation of Tracking and Surveillance, PETS, workshop series was one of the first to evaluate trackers with ground truth, proposing performance measures for comparing tracking algorithms. Other performance measures for tracking are proposed by [38] and [24], as well as in [36]. In the more recent PETS series, [29], VACE [39], and CLEAR [40] metrics were developed for evaluating the performance of multiple target detection and tracking, while in case of single object tracking evaluation there is no consensus and many variations of the same measures are being proposed.…”
There is a large variety of trackers, which have been proposed in the literature during the last two decades with some mixed success. Object tracking in realistic scenarios is a difficult problem, therefore, it remains a most active area of research in computer vision. A good tracker should perform well in a large number of videos involving illumination changes, occlusion, clutter, camera motion, low contrast, specularities, and at least six more aspects. However, the performance of proposed trackers have been evaluated typically on less than ten videos, or on the special purpose datasets. In this paper, we aim to evaluate trackers systematically and experimentally on 315 video fragments covering above aspects. We selected a set of nineteen trackers to include a wide variety of algorithms often cited in literature, supplemented with trackers appearing in 2010 and 2011 for which the code was publicly available. We demonstrate that trackers can be evaluated objectively by survival curves, Kaplan Meier statistics, and Grubs testing. We find that in the evaluation practice the F-score is as effective as the object tracking accuracy (OTA) score. The analysis under a large variety of circumstances provides objective insight into the strengths and weaknesses of trackers.
“…Most papers in their experimental evaluation use a limited number of videos. For example, only six videos are being used in [24]. And, the often used BoBoT dataset, tested in [25], consists of ten different video sequences.…”
Section: Data For Tracker Evaluationmentioning
confidence: 99%
“…The Performance Evaluation of Tracking and Surveillance, PETS, workshop series was one of the first to evaluate trackers with ground truth, proposing performance measures for comparing tracking algorithms. Other performance measures for tracking are proposed by [38] and [24], as well as in [36]. In the more recent PETS series, [29], VACE [39], and CLEAR [40] metrics were developed for evaluating the performance of multiple target detection and tracking, while in case of single object tracking evaluation there is no consensus and many variations of the same measures are being proposed.…”
There is a large variety of trackers, which have been proposed in the literature during the last two decades with some mixed success. Object tracking in realistic scenarios is a difficult problem, therefore, it remains a most active area of research in computer vision. A good tracker should perform well in a large number of videos involving illumination changes, occlusion, clutter, camera motion, low contrast, specularities, and at least six more aspects. However, the performance of proposed trackers have been evaluated typically on less than ten videos, or on the special purpose datasets. In this paper, we aim to evaluate trackers systematically and experimentally on 315 video fragments covering above aspects. We selected a set of nineteen trackers to include a wide variety of algorithms often cited in literature, supplemented with trackers appearing in 2010 and 2011 for which the code was publicly available. We demonstrate that trackers can be evaluated objectively by survival curves, Kaplan Meier statistics, and Grubs testing. We find that in the evaluation practice the F-score is as effective as the object tracking accuracy (OTA) score. The analysis under a large variety of circumstances provides objective insight into the strengths and weaknesses of trackers.
“…Examples include the framework introduced by PETS (Performance Evaluation of Tracking and Surveillance), ETISEO (Evaluation du Traitement et de l'Interpretation de Sequences vidEO), CAVIAR (Context Aware Vision using Imagebased Active Recognition), CLEAR (Classification of Events, Activities and Relationships). Other smaller-scale evaluation frameworks include comprehensive proposals such as the one in [1], and simple approaches such as the one based on "pseudo-synthetic video" sequences [2], on frame-based and object-based metrics [3], on the Label and Size Based Evaluation Measure (LSBEM) [4], or on measuring the tracking difficulty using a reflective model [5]. None of these frameworks has yet been widely taken up by the research This work was supported in part by the EU, under the FP7 project APIDIS (ICT-216023).…”
The growing interest in developing video tracking algorithms has not been accompanied by the development of commonly used evaluation criteria to assess and to compare their performance. Researchers often present trackers' results on different datasets and evaluate them with different performance measures thus hindering both formative and summative quality assessment. In this paper, we present a protocol to evaluate the performance of tracking algorithms that tests video trackers using a set of trials and a pre-defined set of sequences and that enables objective and reproducible performance evaluation of trackers using ground truth information. Each trial highlights strengths and weaknesses of a tracker on simulated test scenarios on real sequences that represent real-world scenarios. Moreover a new evaluation measure is introduced that allows us to summarize the performance of a tracker based on the lost-track-ratio curve. The validation and the effectiveness of the proposed protocol is demonstrated experimentally on three trackers and its implementation is made available online to the research community.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.