Predicting hardware failure using machine learning

Chigurupati, Asha; Thibaux, Romain; Lassar, Noah

doi:10.1109/rams.2016.7448033

Cited by 26 publications

(10 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, sequential data was not used, and thus, the likely interdependence of features over time was not considered. Other authors predict the time until hardware components failed [15]. Furthermore, in previous research, we showed that LSTM outperformed other algorithms like FCN or residual neural network when applied to time series data, however, struggled with data imbalance [16].…”

Section: Introductionmentioning

confidence: 64%

Hardware Failure Prediction on Imbalanced Times Series Data

2021

View full text Add to dashboard Cite

Magnetic resonance imaging (MRI) systems and their continuous, failure-free operation is crucial for high-quality diagnostics and seamless workflows. One important hardware component is coils as they detect the magnetic signal. Before every MRI scan, several image features are captured which represent the used coil’s condition. These image features recorded over time are used to train machine learning models for classification of coils into normal and broken coils for faster and easier maintenance. The state-of-the-art techniques for classification of time series involve different kinds of neural networks. We leveraged sequential data and trained three models, long short-term memory (LSTM), fully convolutional network (FCN), and the combination of those called LSTMFCN as reported by Karim et al. (IEEE access 6:1662–1669, 2017). We found LSTMFCN to combine the benefits of LSTM and FCN. Thus, we achieved the highest F1-score of 87.45% and the highest accuracy of 99.35% using LSTMFCN. Furthermore, we tackled the high data imbalance of only 2.1% data collected from broken coils by training a Gaussian process (GP) regressor and adding predicted sequences as artificial samples to our broken labelled data. Adding 40 synthetic samples increased the classification results of LSTMFCN to an F1-score of 92.30% and accuracy of 99.83%. Thus, MRI head/neck coils can be classified normal or broken by training a LSTMFCN on image features, successfully. Augmenting the data using GP-generated samples can improve the performance even further.

show abstract

Section: Introductionmentioning

confidence: 64%

Hardware Failure Prediction on Imbalanced Times Series Data

2021

View full text Add to dashboard Cite

show abstract

“…The authors in [19] have considered each of the devices and their components separately, labelling them as faulty or not. Thibaux et al [5] decided to distinguish between three classes: "impending failure detected", "not impending failure detected" and "uncertain about future failure". In the case of this work, it was decided to consider the issue as a multiclass classification problem where it will be anticipated which of the four components will fail or none.…”

Section: Data Structure and Preprocessingmentioning

confidence: 99%

Performance Comparison of Machine Learning Algorithms for Predictive Maintenance

Gęca¹

2020

IAPGOS

View full text Add to dashboard Cite

The consequences of failures and unscheduled maintenance are the reasons why engineers have been trying to increase the reliability of industrial equipment for years. In modern solutions, predictive maintenance is a frequently used method. It allows to forecast failures and alert about their possibility. This paper presents a summary of the machine learning algorithms that can be used in predictive maintenance and comparison of their performance. The analysis was made on the basis of data set from Microsoft Azure AI Gallery. The paper presents a comprehensive approach to the issue including feature engineering, preprocessing, dimensionality reduction techniques, as well as tuning of model parameters in order to obtain the highest possible performance. The conducted research allowed to conclude that in the analysed case , the best algorithm achieved 99.92% accuracy out of over 122 thousand test data records. In conclusion, predictive maintenance based on machine learning represents the future of machine reliability in industry.

show abstract

“…However, very few work has attempted to fully analyze and predict high performance cloud system data empirically using a failure-in-production real-time data.The authors in [18] have made a good attempt to analyse the failure data of a large-scale production Cloud environment consisting of over 12,500 servers, which includes a study of failure and repair times and characteristics for both Cloud workloads and servers, but they never looked at the failure correlation between workload intensity and size of the system respectively. The author in [19] developed a machine learning approach for predicting individual component times until failure which they reported it as far more accurate than the traditional MTBF approach. Their algorithm was built to be able to monitor the health of 14 hardware samples and notify them of an impending failure well ahead of actual failure, providing adequate time to fix the problem before actual failure occurred.…”

Section: Related Workmentioning

confidence: 99%

Failure prediction using machine learning in a virtualised HPC system and application

et al. 2019

View full text Add to dashboard Cite

Failure is an increasingly important issue in high performance computing and cloud systems. As large-scale systems continue to grow in scale and complexity, mitigating the impact of failure and providing accurate predictions with sufficient lead time remains a challenging research problem. Traditional existing fault-tolerance strategies such as regular checkpointing and replication are not adequate because of the emerging complexities of high performance computing systems. This necessitates the importance of having an effective as well as proactive failure management approach in place aimed at minimizing the effect of failure within the system. With the advent of machine learning techniques, the ability to learn from past information to predict future pattern of behaviours makes it possible to predict potential system failure more accurately. Thus, in this paper, we explore the predictive abilities of machine learning by applying a number of algorithms to improve the accuracy of failure prediction. We have developed a failure prediction model using time series and machine learning, and performed comparison based tests on the prediction accuracy. The primary algorithms we considered are the Support Vector Machine (SVM), Random Forest(RF), k-Nearest Neighbors (KNN), Classification and Regression Trees (CART) and Linear Discriminant Analysis (LDA). Experimental results indicates that the average prediction accuracy of our model using SVM when predicting fail

show abstract

Predicting hardware failure using machine learning

Cited by 26 publications

References 3 publications

Hardware Failure Prediction on Imbalanced Times Series Data

Hardware Failure Prediction on Imbalanced Times Series Data

Performance Comparison of Machine Learning Algorithms for Predictive Maintenance

Failure prediction using machine learning in a virtualised HPC system and application

Contact Info

Product

Resources

About