Detecting outliers in a large set of data objects is a major data mining task aiming at finding different mechanisms responsible for different groups of objects in a data set. All existing approaches, however, are based on an assessment of distances (sometimes indirectly by assuming certain distributions) in the full-dimensional Euclidean data space. In high-dimensional data, these approaches are bound to deteriorate due to the notorious "curse of dimensionality". In this paper, we propose a novel approach named ABOD (Angle-Based Outlier Detection) and some variants assessing the variance in the angles between the difference vectors of a point to the other points. This way, the effects of the "curse of dimensionality" are alleviated compared to purely distance-based approaches. A main advantage of our new approach is that our method does not rely on any parameter selection influencing the quality of the achieved ranking. In a thorough experimental evaluation, we compare ABOD to the well-established distance-based method LOF for various artificial and a real world data set and show ABOD to perform especially well on high-dimensional data.
Over recent years data mining has been establishing itself as one of the major disciplines in computer science with growing industrial impact. Undoubtedly, research in data mining will continue and even increase over coming decades. In this article, we sketch our vision of the future of data mining. Starting from the classic definition of "data mining", we elaborate on topics that -in our opinion -will set trends in data mining.
Multiplayer Online Battle Arena (MOBA) games are among the most played digital games in the world. In these games, teams of players fight against each other in arena environments, and the gameplay is focussed on tactical combat. In this paper, we present three data-driven measures of spatiotemporal behaviour in Defence of the Ancients 2 (DotA 2): 1) Zone changes; 2) Distribution of team members and: 3) Time series clustering via a fuzzy approach. We present a method for obtaining accurate positional data from DotA 2. We investigate how behaviour varies across these measures as a function of the skill level of teams, using four tiers from novice to professional players. Results from three analyses indicate that spatio-temporal behaviour of MOBA teams is highly related to team skill.
Monitoring tree regeneration in forest areas disturbed by resource extraction is a requirement for sustainably managing the boreal forest of Alberta, Canada. Small remotely piloted aircraft systems (sRPAS, a.k.a. drones) have the potential to decrease the cost of field surveys drastically, but produce large quantities of data that will require specialized processing techniques. In this study, we explored the possibility of using convolutional neural networks (CNNs) on this data for automatically detecting conifer seedlings along recovering seismic lines: a common legacy footprint from oil and gas exploration. We assessed three different CNN architectures, of which faster region-CNN (R-CNN) performed best (mean average precision 81%). Furthermore, we evaluated the effects of training-set size, season, seedling size, and spatial resolution on the detection performance. Our results indicate that drone imagery analyzed by artificial intelligence can be used to detect conifer seedling in regenerating sites with high accuracy, which increases with the size in pixels of the seedlings. By using a pre-trained network, the size of the training dataset can be reduced to a couple hundred seedlings without any significant loss of accuracy. Furthermore, we show that combining data from different seasons yields the best results. The proposed method is a first step towards automated monitoring of forest restoration/regeneration.
The detection performance of small objects in remote sensing images has not been satisfactory compared to large objects, especially in low-resolution and noisy images. A generative adversarial network (GAN)-based model called enhanced super-resolution GAN (ESRGAN) showed remarkable image enhancement performance, but reconstructed images usually miss high-frequency edge information. Therefore, object detection performance showed degradation for small objects on recovered noisy and low-resolution remote sensing images. Inspired by the success of edge enhanced GAN (EEGAN) and ESRGAN, we applied a new edge-enhanced super-resolution GAN (EESRGAN) to improve the quality of remote sensing images and used different detector networks in an end-to-end manner where detector loss was backpropagated into the EESRGAN to improve the detection performance. We proposed an architecture with three components: ESRGAN, EEN, and Detection network. We used residual-in-residual dense blocks (RRDB) for both the ESRGAN and EEN, and for the detector network, we used a faster region-based convolutional network (FRCNN) (two-stage detector) and a single-shot multibox detector (SSD) (one stage detector). Extensive experiments on a public (car overhead with context) dataset and another self-assembled (oil and gas storage tank) satellite dataset showed superior performance of our method compared to the standalone state-of-the-art object detectors.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.