Dealing with digital data for mining novel knowledge is a non trivial task that has received much attention in the last years. However, it is still not easy to handle such data, especially when large volumes of values must be analyzed. In our work, we focus on biological data from DNA chips that biologists study in order to try and discover new gene correlations that could help understanding diseases like breast cancer. In this framework, we consider the values from the DNA microarrays, which convey the behavior of some genes, and we want to discover how these behaviors are correlated. This data are digital values that can be ordered and sorted. In previous work, sequential patterns like (1 5)(2) have been discovered, meaning that genes 1 and 5 have the same expression level followed by gene 2 that has a higher expression value. However, such data are very noisy and considering close values as ordered is often false. We thus consider here fuzzy rankings based on a fuzzy partition provided by the experts. Rules can then better characterize how genes are correlated.
Nowadays, the management of sequential patterns data becomes an increasing need in biological knowledge discovery processes. An important task in these processes is the restitution of the results obtained by using data mining methods. In a complex domain as biomedical, an efficient interpretation of the patterns without any assistance is difficult. One of the most common knowledge discovery proces is clustering. But the application of clustering to gene sequential patterns is far from easy on biomedical data.In this paper, we introduce a new gene sequential patterns similarity function and summarization algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.