One of the most important factors in training object recognition networks using convolutional neural networks (CNNs) is the provision of annotated data accompanying human judgment. Particularly, in object detection or semantic segmentation, the annotation process requires considerable human effort. In this paper, we propose a semi-supervised learning (SSL)-based training methodology for object detection, which makes use of automatic labeling of un-annotated data by applying a network previously trained from an annotated dataset. Because an inferred label by the trained network is dependent on the learned parameters, it is often meaningless for re-training the network. To transfer a valuable inferred label to the unlabeled data, we propose a re-alignment method based on co-occurrence matrix analysis that takes into account one-hot-vector encoding of the estimated label and the correlation between the objects in the image. We used an MS-COCO detection dataset to verify the performance of the proposed SSL method and deformable neural networks (D-ConvNets) [1] as an object detector for basic training. The performance of the existing state-of-the-art detectors YOLO v2 [2], and single shot multi-box detector (SSD) [3]) can be improved by the proposed SSL method without using the additional model parameter or modifying the network architecture.
Background and Objective: Context-aware computer-assisted surgical systems require real time automatic and accurate surgical workflow recognition. Since several years, video has been the most common modality used to develop such methods. With the democratization of robotic-assisted surgery and segmentation method, new modalities are now accessible, such as kinematics. Some previous works used these new modalities as input of their models, but the added value of these modalities is rarely studied. This paper presents the design and results of the "PEg TRAnsfert Workflow recognition" (PETRAW) challenge whose objective was to develop surgical workflow recognition methods based on one or several modalities in order to study their added value. Methods: The PETRAW challenge provided a data set of 150 peg transfer sequences performed on a virtual simulator. This data set was composed of videos, kinematics, semantic segmentation, and workflow annotations which described the sequences at three different granularity levels: phase, step, and activity. Five tasks were proposed to the participants: three of them were related to the recognition of all granularities with one of the available modalities, while the others addressed the recognition with a combination of modalities. Average application-dependent balanced accuracy (AD-Accuracy) was used as evaluation metric to take unbalanced classes into account and because it is more clinically relevant than a frame-by-frame score. Results: Seven teams participated in at least one task and four of them in all tasks. Best results are obtained with the use of the video and the kinematics data with an AD-Accuracy between 93% and 90% for the four teams who participated in all tasks.
Conclusion:The improvement between video/kinematic-based methods and the uni-modality ones was significant for all of the teams. However, the difference in testing execution time between the video/kinematic-based and the kinematic-based methods has to be taken into consideration. Is it relevant to spend 20 to 200 times more computing time for less than 3% of improvement? The PETRAW data set is publicly available at www.synapse.org/PETRAW to encourage further research in surgical workflow recognition.
Traditional pedestrian collision warning systems sometimes raise alarms even when there is no danger (e.g., when all pedestrians are walking on the sidewalk). These false alarms can make it difficult for drivers to concentrate on their driving. In this paper, we propose a novel framework for an end-to-end pedestrian collision warning system based on a convolutional neural network. Semantic segmentation information is used to train the convolutional neural network and two loss functions, such as cross entropy and Euclidean losses, are minimized. Finally, we demonstrate the effectiveness of our method in reducing false alarms and increasing warning accuracy compared to a traditional histogram of oriented gradients (HoG)-based system.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.