Digital data storage systems such as hard drives can suffer breakdowns that cause the loss of stored data. Due to the cost of data and the damage that its loss entails, hard drive failure prediction is vital. In this context, the objective of this paper is to develop a method for detecting the beginning of hard drive malfunction using streaming SMART data, allowing the user to take actions before the breakdown occurs. This is a challenging task for two main reasons. First, there are not usually many examples of failed hard drives. Second, in these few available examples, hard drives are only identified and labeled as failed after complete breakdown occurs, but the exact moment when they begin to malfunction is usually unknown. Both these aspects significantly complicate the supervised learning of hard drive failure prediction models. To cope with these issues, the problem is addressed as a multidimensional time series streaming classification problem based on sliding windows. Moreover, as a solution to the highly imbalanced situation, the learned classifier is optimized to maximize the minimum recall of classes. Experimental results using the Backblaze benchmark dataset show that the proposed method reliably anticipates hard drive failures and obtains a higher balance between the recall values of both classes, failed and correct disks, compared to other state-of-the-art solutions.
There is a need to facilitate access to the required information in the web and adapting it to the users' preferences and requirements. This paper presents a system that, based on a collaborative filtering approach, adapts the web site to improve the browsing experience of the user: it generates automatically interesting links for new users. The system only uses the web log files stored in any web server (common log format) and builds user profiles from them combining machine learning techniques with a generalization process for data representation. These profiles are later used in an exploitation stage to automatically propose links to new users. The paper examines the effect of the parameters of the system on its final performance. Experiments show that the designed system performs efficiently in a database accessible from the web and that the use of a generalization process, specificity in profiles and the use of frequent pattern mining techniques benefit the profile generation phase, and, moreover, diversity seems to help in the exploitation phase.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.