Deep learning has become increasingly popular in both academic and industrial areas nowadays. Various domains including pattern recognition, Computer vision have witnessed the great power of deep neural networks. However, current studies on deep learning mainly focus on quality data sets with balanced class labels, while training on bad and imbalanced data set have been providing great challenges for classification tasks. We propose in this paper a method of data analysis-based data reduction techniques for selecting good and diversity data samples from a large dataset for a deep learning model. Furthermore, data sampling techniques could be applied to decrease the large size of raw data by retrieving its useful knowledge as representatives. Therefore, instead of dealing with large size of raw data, we can use some data reduction techniques to sample data without losing important information. We group PCB characters in classes and train deep learning on the ResNet56 v2 and SENet model in order to improve the classification performance of optical character recognition (OCR) character classifier.
As the size of components mounted on printed circuit boards (PCBs) decreases, defect detection becomes more important. The first step in an inspection involves recognizing and inspecting characters printed on parts attached to the PCB. In addition, since industrial fields that produce PCBs can change very rapidly, the style of the collected data may vary between collection sites and collection periods. Therefore, flexible learning data that can respond to all fields and time periods are needed. In this paper, large amounts of character data on PCB components were obtained and analyzed in depth. In addition, we proposed a method of recognizing characters by constructing a dataset that was robust with various fonts and environmental changes using a large amount of data. Moreover, a coreset capable of evaluating an effective deep learning model and a base set using n-pick sampling capable of responding to a continuously increasing dataset were proposed. Existing original data and the EfficientNet B0 model showed an accuracy of 97.741%. However, the accuracy of our proposed model was increased to 98.274% for the coreset of 8000 images per class. In particular, the accuracy was 98.921% for the base set with only 1900 images per class.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.