Modern CNN-based object detectors rely on bounding box regression and non-maximum suppression to localize objects. While the probabilities for class labels naturally reflect classification confidence, localization confidence is absent. This makes properly localized bounding boxes degenerate during iterative regression or even suppressed during NMS. In the paper we propose IoU-Net learning to predict the IoU between each detected bounding box and the matched ground-truth. The network acquires this confidence of localization, which improves the NMS procedure by preserving accurately localized bounding boxes. Furthermore, an optimization-based bounding box refinement method is proposed, where the predicted IoU is formulated as the objective. Extensive experiments on the MS-COCO dataset show the effectiveness of IoU-Net, as well as its compatibility with and adaptivity to several state-of-the-art object detectors.
Aggregating extra features has been considered as an effective approach to boost traditional pedestrian detection methods. However, there is still a lack of studies on whether and how CNN-based pedestrian detectors can benefit from these extra features. The first contribution of this paper is exploring this issue by aggregating extra features into CNNbased pedestrian detection framework. Through extensive experiments, we evaluate the effects of different kinds of extra features quantitatively. Moreover, we propose a novel network architecture, namely HyperLearner, to jointly learn pedestrian detection as well as the given extra feature. By multi-task training, HyperLearner is able to utilize the information of given features and improve detection performance without extra inputs in inference. The experimental results on multiple pedestrian benchmarks validate the effectiveness of the proposed HyperLearner.
We present the Visually Grounded Neural Syntax Learner (VG-NSL), an approach for learning syntactic representations and structures without explicit supervision. The model learns by looking at natural images and reading paired captions. VG-NSL generates constituency parse trees of texts, recursively composes representations for constituents, and matches them with images. We define the concreteness of constituents by their matching scores with images, and use it to guide the parsing of text. Experiments on the MSCOCO data set show that VG-NSL outperforms various unsupervised parsing approaches that do not use visual grounding, in terms of F 1 scores against gold parse trees. We find that VG-NSL is much more stable with respect to the choice of random initialization and the amount of training data. We also find that the concreteness acquired by VG-NSL correlates well with a similar measure defined by linguists. Finally, we also apply VG-NSL to multiple languages in the Multi30K data set, showing that our model consistently outperforms prior unsupervised approaches.
Various methods have been proposed to define the rainfall thresholds for the landslide prediction. Once the appropriate threshold is determined, it remains the same regardless of the antecedent soil moisture conditions. However, given the important role of the antecedent soil moisture in the initiation of landslides, it is considered if the rainfall threshold level varies according to the antecedent soil moisture conditions, the prediction performance will be improved. Therefore, in this study we propose a probabilistic threshold to integrate antecedent soil moisture conditions with rainfall thresholds. In order to take into account the conditions with landslides and without landslides, the Bayesian analysis is applied to estimate the landslide occurrence probability given the various combinations of two factors: the antecedent soil moisture and the severity of the recent rainfall event. These combinations are then divided into conditions that are likely to trigger landslides and those unlikely to trigger landslides by comparing their probabilities with a critical value. In this way, the probabilistic threshold is determined. Here the soil moisture is estimated using the distributed hydrological model, and the severity of the rainfall event is characterized by the cumulated event rainfall-rainfall duration (ED) thresholds with different exceedance probabilities. The proposed approach was applied to a sub-region of the Emilia-Romagna region in northern Italy. The results show that the probabilistic threshold has a better prediction performance than the ED rainfall threshold, especially in terms of reducing false alarms. This study provides an effective approach to improve the prediction capability of the ED rainfall threshold, benefiting its application in the landslide prediction.
Three Gorges Reservoir (TGR) is one of the largest man-made lakes in the world. Since the impoundment in 2003, however, algal blooms have been often observed in the tributary embayments. To control the algal blooms, a thorough understanding of the hydrodynamics (e.g., flow regime, velocity gradient, and velocity magnitude and direction) in the tributary embayments is particularly important. Using a calibrated three-dimensional hydrodynamic model, we carried out a hydrodynamic analysis of a typical tributary embayment (i.e., Xiangxi Bay) with emphasis on the longitudinal patterns. The results show distinct longitudinal gradients of hydrodynamics in the study area, which can be generally characterized as four zones: riverine, intermediate, lacustrine, and mainstream influenced zones. Compared with the typical longitudinal zonation for a pure reservoir, there is an additional mainstream influenced zone near the mouth due to the strong effects of TGR mainstream. The blooms are prone to occur in the intermediate and lacustrine zones; however, the hydrodynamic conditions of riverine and mainstream influence zones are not propitious for the formation of algal blooms. This finding helps to diagnose the sensitive areas for algal bloom occurrence.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.