Data imbalance is a thorny issue in machine learning. SMOTE is a famous oversampling method of imbalanced learning. However, it has some disadvantages such as sample overlapping, noise interference, and blindness of neighbor selection. In order to address these problems, we present a new oversampling method, OS-CCD, based on a new concept, the classification contribution degree. The classification contribution degree determines the number of synthetic samples generated by SMOTE for each positive sample. OS-CCD follows the spatial distribution characteristics of original samples on the class boundary, as well as avoids oversampling from noisy points. Experiments on twelve benchmark datasets demonstrate that OS-CCD outperforms six classical oversampling methods in terms of accuracy, F1-score, AUC, and ROC.
The back-propagation (BP) algorithm is usually used to train convolutional neural networks (CNNs) and has made greater progress in image classification. It updates weights with the gradient descent, and the farther the sample is from the target, the greater the contribution of it to the weight change. However, the influence of samples classified correctly but that are close to the classification boundary is diminished. This paper defines the classification confidence as the degree to which a sample belongs to its correct category, and divides samples of each category into dangerous and safe according to a dynamic classification confidence threshold. Then a new learning algorithm is presented to penalize the loss function with danger samples but not all samples to enable CNN to pay more attention to danger samples and to learn effective information more accurately. The experiment results, carried out on the MNIST dataset and three sub-datasets of CIFAR-10, showed that for the MNIST dataset, the accuracy of Non-improve CNN reached 99.246%, while that of PCNN reached 99.3%; for three sub-datasets of CIFAR-10, the accuracies of Non-improve CNN are 96.15%, 88.93%, and 94.92%, respectively, while those of PCNN are 96.44%, 89.37%, and 95.22%, respectively.
Scratches, those usually generated during polishing the silicon wafer surface, are one of the major yield loss factors in semiconductor manufacturing industry. In order to determine the source of the scratches in real time and reduce the yield loss, it is critical for manufacturers to match and identify the same type of scratches automatically. In this paper, an improved K nearest neighbors (KNN) algorithm to address this issue is presented. Firstly, a skeleton extraction method is used to depict the main lines of scratches. Then the clustering protocol is applied as a preliminary step to group these main lines so that some essential endpoints features of main lines, such as distance, slope and curvature, can be extracted. During feature extraction, a dynamic coordinate system is introduced and this greatly reduces the distortions arise due to the magnitude of tangent difference. An intelligent matching of similar scratches MSML-KNN algorithm is formulated. The experimental results show that the proposed matching method for wafer scratches has a good adaptability and robustness.
The cover image is based on the Original Article TNF‐α increases the risk of bleeding in patients after CAR T‐cell therapy: A bleeding model based on a real‐world study of Chinese CAR T Working Party by Jiaqian Qi et al., https://doi.org/10.1002/hon.2931.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.