In this paper we propose a new method to assist in labeling data arriving from fast running processes using anomaly detection. A result is the possibility to manually classify data arriving at a high rates to train machine learning models. To circumvent the problem of not having a real ground truth we propose specific metrics for model selection and validation of the results. The use case is taken from the food packaging industry, where processes are affected by regular but short breakdowns causing interruptions in the production process. Fast production rates make it hard for machine operators to identify the source and thus the cause of the breakdown. Self learning assistance systems can help them finding the root cause of the problem and assist the machine operator in applying lasting solutions. These learning systems need to be trained to identify reoccurring problems using data analytics. Training is not easy as the process is too fast to be manually monitored to add specific classifications on the single data points.
Much research is done on data analytics and machine learning for data coming from industrial processes. In practical approaches, one finds many pitfalls restraining the application of these modern technologies especially in brownfield applications. With this paper, we want to show state of the art and what to expect when working with stock machines in the field. The paper is a review of literature found to cover challenges for cyber-physical production systems (CPPS) in brownfield applications. This review is combined with our own personal experience and findings gained while setting up such systems in processing and packaging machines as well as in other areas. A major focus in this paper is on data collection, which tends be more cumbersome than most people might expect. In addition, data quality for machine learning applications is a challenge once leaving the laboratory and its academic data sets. Topics here include missing ground truth or the lack of semantic description of the data. A last challenge covered is IT security and passing data through firewalls to allow for the cyber part in CPPS. However, all of these findings show that potentials of data driven production systems are strongly depending on data collection to build proclaimed new automation systems with more flexibility, improved human–machine interaction and better process-stability and thus less waste during manufacturing.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.