A deep learning-based binary classifier was proposed to diagnose tuberculosis (TB) and non-TB disease using a chest X-ray radiograph. The proposed classifier comprised two-step binary decision trees, each trained by a deep learning model with convolution neural network (CNN) based on the PyTorch frame. Normal and abnormal images of chest X-ray was classified in the first step. The abnormal images were predicted to be classified into TB and non-TB disease by the second step of the process. The accuracies of first and second step were 98% and 80% respectively. Moreover, re-training could improve the stability of prediction accuracy for images in different data groups.
When training an anchor-based object detector with a sparsely annotated dataset, the effort required to locate positive examples can cause performance degradation. Because anchorbased object detection models collect positive examples under IoU between anchors and ground-truth bounding boxes, in a sparsely annotated image, some objects that are not annotated can be assigned as negative examples, such as backgrounds. We attempt to solve this problem with two approaches: 1) using an anchor-less object detector and 2) using a single-object tracker for semi-supervised learning-based object detection. The proposed technique performs bidirectional single-object tracking from sparsely annotated bounding boxes as starting points in videos to obtain dense annotations. On applying our method to the EPIC-KITCHENS-55 dataset, we were able to achieve runner-up performance in the Unseen section, while achieving the first place in the Seen section of the EPIC-KITCHENS 2020 object detection challenge under IoU > 0.5 on the EPIC-KITCHENS 2020 object detection challenge.
Algorithms for video action recognition should consider not only spatial information but also temporal relations, which remains challenging. We propose a 3D-CNN-based action recognition model, called the blockwise temporalspatial path-way network (BTSNet), which can adjust the temporal and spatial receptive fields by multiple pathways. We designed a novel model inspired by an adaptive kernel selection-based model, which is an architecture for effective feature encoding that adaptively chooses spatial receptive fields for image recognition. Expanding this approach to the temporal domain, our model extracts temporal and channelwise attention and fuses information on various candidate operations. For evaluation, we tested our proposed model on UCF-101, HMDB-51, SVW, and Epic-Kitchen datasets and showed that it generalized well without pretraining. BTSNet also provides interpretable visualization based on spatiotemporal channel-wise attention. We confirm that the blockwise temporal-spatial pathway supports a better representation for 3D convolutional blocks based on this visualization.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.