The representation and similarity measure of time series are the basis of time series research, and are quite important for improving the efficiency and accuracy of the time series data mining. In this paper, shape-based discrete symbolic representation and distance measure, which is used to measure the similarity between time series are present. This method quantitatively represents the change of the shape of the time series. Compared with the approaches that existing similar, the present method is more intuitive and compact, and is not sensitive to offset translation, amplitude scaling, compress and stretch. That can reflect the degree of the dynamic change of the tendency and erase the influence of the noises, classify the patterns in more detail, which is favorable to improve the accuracy of the clustering, and multi-scale feature. The experimental results show that our approach has good effectiveness in clustering, which can satisfies the requirement of the shape-similarity of time series effectively under various analyzing frequency.
Vector Space Model ( VSM ) is usually used to express text features in text mining with huge dimension, but it can not show the structure of the text set obviously and costs much in computing. A new pursuit projection based text clustering algorithm is proposed. With minimizing (or maximizing) a projecting index, Projection Pursuit searches for an optimal projection direction and projects text feature vectors from high-dimensional into low-dimensional ( 1 to 3 dimensions ) space. The linear and non-linear structures and features of the original high-dimensional data can be expressed by its projection weights in the optimal projection direction. The optimal projection direction is looked for by genetic algorithm, and the distribution of texts can be visualized. Pursuit projection based text clustering does not need to set cluster number previously like in k-means clustering, and opens out non-linear structure not like in latent semantics analysis only discovering linear structure. Experiments demonstrated that this algorithm is effective to cluster texts.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.