MegDet: A Large Mini-Batch Object Detector

Peng, Chao; Xiao, Tete; Li, Zeming; Jiang, Yuning; Zhang, Xiangyu; Jia, Kai; Yu, Gang; Sun, Jian

doi:10.1109/cvpr.2018.00647

Cited by 294 publications

(220 citation statements)

References 40 publications

Supporting

Mentioning

217

Contrasting

Order By: Relevance

“…This implies that our HRNet benefits more from longer training. Table 11 reports the comparison of our network to state-of-the-art single-model object detectors on COCO test-dev without using multi-scale training and multi- scale testing that are done in [65], [77], [88], [93], [102], [103]. In the Faster R-CNN framework, our networks perform better than ResNets with similar parameter and computation complexity: HRNetV2p-W32 vs. ResNet-101-FPN, HRNetV2p-W40 vs. ResNet-152-FPN, HRNetV2p-W48 vs. X-101-64 × 4d-FPN.…”

Section: Coco Object Detectionmentioning

confidence: 99%

Deep High-Resolution Representation Learning for Visual Recognition

Wang

Sun

Cheng

et al. 2021

IEEE Trans. Pattern Anal. Mach. Intell.

2,305

1,479

View full text Add to dashboard Cite

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions in series (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, our proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams in parallel; (ii) Repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. We show the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems. All the codes are available at https://github.com/HRNet. ! 1 INTRODUCTION D EEP convolutional neural networks (DCNNs) have achieved state-of-the-art results in many computer vision tasks, such as image classification, object detection, semantic segmentation, human pose estimation, and so on. The strength is that DCNNs are able to learn richer representations than conventional hand-crafted representations. Most recently-developed classification networks, including AlexNet [59], VGGNet [101], GoogleNet [108], ResNet [39], etc., follow the design rule of LeNet-5 [61]. This is depicted in Figure 1 (a): gradually reduce the spatial size of the feature maps, connect the convolutions from high resolution to low resolution in series, and lead to a low-resolution representation, which is further processed for classification.High-resolution representations are needed for positionsensitive tasks, e.g., semantic segmentation, human pose estimation, and object detection. The previous state-of-the-art methods adopt the high-resolution recovery process to raise the representation resolution from the low-resolution representation outputted by a classification or classification-like network as depicted in Figure 1 (b), e.g., Hourglass [83], Seg-Net [3], DeconvNet [85], U-Net [95], SimpleBaseline [124], and encoder-decoder [90]. In addition, dilated convolutions are used to remove some down-sample layers and thus yield medium-resolution representations [15], [144].We present a novel architecture, namely High-Resolution Net (HRNet), which is able to maintain high-resolution representations through the whole process. We start from a highresolution convolution stream, gradually add high-to-low resolution convolution streams one by one, and connect the multi-resolution streams in parallel. The resulting network • J. Wang is with Microsoft Research,

show abstract

Section: Coco Object Detectionmentioning

confidence: 99%

Deep High-Resolution Representation Learning for Visual Recognition

Wang

Sun

Cheng

et al. 2021

IEEE Trans. Pattern Anal. Mach. Intell.

2,305

1,479

View full text Add to dashboard Cite

show abstract

“…(ii) The fact that some regions are over-sampled and some are undersampled might have adverse effects on learning, as the size of sample (i.e. batch size) is known to be related to the optimal learning rate [166].…”

Section: Imbalance In Overlapping Bbsmentioning

confidence: 99%

Imbalance Problems in Object Detection: A Review

Öksüz

Cam

Kalkan

et al. 2021

IEEE Trans. Pattern Anal. Mach. Intell.

352

171

View full text Add to dashboard Cite

In this paper, we present a comprehensive review of the imbalance problems in object detection. To analyze the problems in a systematic manner, we introduce two taxonomies; one for the problems and the other for the proposed solutions. Following the taxonomy for the problems, we discuss each problem in depth and present a unifying yet critical perspective on the solutions in the literature. In addition, we identify major open issues regarding the existing imbalance problems as well as imbalance problems that have not been discussed before. Moreover, in order to keep our review up to date, we provide an accompanying webpage which categorizes papers addressing imbalance problems, according to our problem-based taxonomy. Researchers can track newer studies on this webpage available at: https://github.com/kemaloksuz/ObjectDetectionImbalance.

show abstract

“…Notably, due to limitations on our computational resources, we could not increase the batch size inputted into DeepLabV3+ beyond a size of 8, whereas U-Net could use a batch size up to 32 (Table 3). In theory, a larger mini-batch size should help the network converge to a better minimum and therefore better final accuracy [38]. However, we did not see the benefit of larger batch-size in our model training for U-Net, where a model trained with a batch size of 4 achieved the best performance.…”

Section: Comparing the Performance Of A U-net Architecture Based Deepmentioning

confidence: 71%

A pilot exploratory study of the potentials of deep learning methods in cancer image segmentation and classification

Liu

2019

Preprint

View full text Add to dashboard Cite

Background: Tumor classification and feature quantification from H&E histology images are critical tasks for cancer diagnosis, cancer research, and treatment. However, both tasks involve tedious and time-consuming manual examination of histology images. We explored the possibilities of using deep learning methods to perform segmentation and classification of histology images of cancer tissue for their potential in computeraided tumor diagnosis and other clinical and research applications. Specifically, we tested selected deep learning methods for their performance in the segmentation of stroma and glandular objects in tumor image data and in the classification of tumor images. We automated these tasks to help facilitate downstream tumor image analysis, reduce the labor load of pathologists, and provide them with a second opinion on their analysis. Methods: We modified a patch-based U-Net model and trained it to perform stroma detection and segmentation in cancer tissue. Then the semantic segmentation capabilities of the U-Net model were compared with that of a DeepLabV3+ model. We also explored the possible use of transfer learning to train a patch-based model to classify cancer tissue images as carcinoma and sarcoma and to further classify them as carcinoma subtypes. Results:In spite of the limited dataset available for the pilot study, we found that the unconventional DeepLabV3+ model performed biomedical image segmentation more effectively than U-Net when k-fold cross-validation was utilized, but U-Net still showed promise as an effective and efficient model when we used a customized validation approach. We believe that the DeepLabV3+ model can perform segmentation with even more accuracy if computation resource constraints are removed or if more data is used to augment the result. In terms of tumor classification, our selected models also consistently achieve test accuracies above 80%, with a model trained using transfer learning with VGG-16 network as the feature extractors, or convolutional base performing best. For multi-class tumor subtype classification, we also observed promising test accuracies from our models, and a customized post-processing method provided even higher prediction accuracy on test set images and this method can be further investigated. Conclusions:This pilot exploratory study provided strong evidence for the powerful potentials of deep learning models for segmentation and classification of tumor image data.

show abstract

MegDet: A Large Mini-Batch Object Detector

Cited by 294 publications

References 40 publications

Deep High-Resolution Representation Learning for Visual Recognition

Deep High-Resolution Representation Learning for Visual Recognition

Imbalance Problems in Object Detection: A Review

A pilot exploratory study of the potentials of deep learning methods in cancer image segmentation and classification

Contact Info

Product

Resources

About