In industrial data analytics, one of the fundamental problems is to utilize the temporal correlation of the industrial data to make timely predictions in the production process, such as fault prediction and yield prediction. However, the traditional prediction models are fixed while the conditions of the machines change over time, thus making the errors of predictions increase with the lapse of time. In this paper, we propose a general data renewal model to deal with it. Combined with the similarity function and the loss function, it estimates the time of updating the existing prediction model, then updates it according to the evaluation function iteratively and adaptively. We have applied the data renewal model to two prediction algorithms. The experiments demonstrate that the data renewal model can effectively identify the changes of data, update and optimize the prediction model so as to improve the accuracy of prediction.
Subsequence matching is an important and fundamental problem on time series data. This paper studies the inherent time complexity of the subsequence matching problem and designs a more efficient algorithm for solving the problem. Firstly, it is proved that the subsequence matching problem is incomputable in time
O
(
n
1-δ
) even allowing polynomial time preprocessing if the hypothesis SETH is true, where
n
is the size of the input time series and 0 ≤ δ < 1, i.e., the inherent complexity of the subsequence matching problem is
ω
(
n
1-δ
). Secondly, an efficient algorithm for subsequence matching problem is proposed. In order to improve the efficiency of the algorithm, we design a new summarization method as well as a novel index for series data. The proposed algorithm supports both Euclidean Distance and DTW distance with or without
z
-normalization. Experimental results show that the proposed algorithm is up to about 3 ~ 10 times faster than the state of art algorithm on the constrained
z
-normalized Euclidean Distance and DTW distance, and is up to 7 ~ 12 times faster on Euclidean Distance.
Knowledge base (KB) is an important aspect in artificial intelligence. One significant challenge faced by KB construction is that it contains many noises, which prevent its effective usage. Even though some KB cleansing algorithms have been proposed, they focus on the structure of the knowledge graph and neglect the relation between the concepts, which could be helpful to discover wrong relations in KB. Motived by this, we measure the relation of two concepts by the distance between their corresponding instances and detect errors within the intersection of the conflicting concept sets. For efficient and effective knowledge base cleansing, we first apply a distance-based model to determine the conflicting concept sets using two different methods. Then, we propose and analyze several algorithms on how to detect and repair the errors based on our model, where we use a hash method for an efficient way to calculate distance. Experimental results demonstrate that the proposed approaches could cleanse the knowledge bases efficiently and effectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.