Self‐supervised learning allows training of neural networks without immense, high‐quality or labelled data sets. We demonstrate that self‐supervision furthermore improves robustness of models using small, imbalanced or incomplete data sets which pose severe difficulties to supervised models. For small data sets, the accuracy of our approach is up to 12.5% higher using MNIST and 15.2% using Fashion‐MNIST compared to random initialization. Moreover, self‐supervision influences the way of learning itself, which means that in case of small or strongly imbalanced data sets, it can be prevented that classes are not or insufficiently learned. Even if input data are corrupted and large image regions are missing from the training set, self‐supervision significantly improves classification accuracy (up to 7.3% for MNIST and 2.2% for Fashion‐MNIST). In addition, we analyse combinations of data manipulations and seek to generate a better understanding of how pretext accuracy and downstream accuracy are related. This is not only important to ensure optimal pretraining but also for training with unlabelled data in order to find an appropriate evaluation measure. As such, we make an important contribution to learning with realistic data sets and making machine learning accessible to application areas that require expensive and difficult data collection.
The identification of outliers is mainly based on unannotated data and therefore constitutes an unsupervised problem. The lack of a label leads to numerous challenges that do not occur or only occur to a lesser extent when using annotated data and supervised methods. In this paper, we focus on two of these challenges: the selection of hyperparameters and the selection of informative features. To this end, we propose a method to transform the unsupervised problem of outlier detection into a supervised problem. Benchmarking our approach against common outlier detection methods shows clear advantages of our method when many irrelevant features are present. Furthermore, the proposed approach also scores very well in the selection of hyperparameters, i.e., compared to methods with randomly selected hyperparameters.
The role of quality control based on images is important in industrial production. Nevertheless, this problem has not been addressed in computer vision for a long time. In recent years, this has changed: driven by publicly available datasets, a variety of methods have been proposed for detecting anomalies and defects in workpieces. In this survey, we present more than 40 methods that promise the best results for this task. In a comprehensive benchmark, we show that more datasets and metrics are needed to move the field forward. Further, we highlight strengths and weaknesses, discuss research gaps and future research areas.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.