Malware detection at scale in the Android realm is often carried out using machine learning techniques. State-of-the-art approaches such as DREBIN and MaMaDroid are reported to yield high detection rates when assessed against well-known datasets. Unfortunately, such datasets may include a large portion of duplicated samples, which may bias recorded experimental results and insights. In this article, we perform extensive experiments to measure the performance gap that occurs when datasets are de-duplicated. Our experimental results reveal that duplication in published datasets has a limited impact on supervised malware classification models. This observation contrasts with the finding of Allamanis on the general case of machine learning bias for big code. Our experiments, however, show that sample duplication more substantially affects unsupervised learning models (e.g., malware family clustering). Nevertheless, we argue that our fellow researchers and practitioners should always take sample duplication into consideration when performing machine-learning-based (via either supervised or unsupervised learning) Android malware detections, no matter how significant the impact might be.
We introduce Baseline: a library for reproducible deep learning research and fast model development for NLP. The library provides easily extensible abstractions and implementations for data loading, model development, training and export of deep learning architectures. It also provides implementations for simple, high-performance, deep learning models for various NLP tasks, against which newly developed models can be compared. Deep learning experiments are hard to reproduce, Baseline provides functionalities to track them. The goal is to allow a researcher to focus on model development, delegating the repetitive tasks to the library.
Accurate detection of sea-surface objects is vital for the safe navigation of autonomous ships. With the continuous development of artificial intelligence, electro-optical (EO) sensors such as video cameras are used to supplement marine radar to improve the detection of objects that produce weak radar signals and small sizes. In this study, we propose an enhanced convolutional neural network (CNN) named VarifocalNet* that improves object detection in harsh maritime environments. Specifically, the feature representation and learning ability of the VarifocalNet model are improved by using a deformable convolution module, redesigning the loss function, introducing a soft non-maximum suppression algorithm, and incorporating multi-scale prediction methods. These strategies improve the accuracy and reliability of our CNN-based detection results under complex sea conditions, such as in turbulent waves, sea fog, and water reflection. Experimental results under different maritime conditions show that our method significantly outperforms similar methods (such as SSD, YOLOv3, RetinaNet, Faster R-CNN, Cascade R-CNN) in terms of the detection accuracy and robustness for small objects. The maritime obstacle detection results were obtained under harsh imaging conditions to demonstrate the performance of our network model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.