To discover relationships and associations rapidly in large-scale datasets, we propose a cross-platform tool for the rapid computation of the maximal information coefficient based on parallel computing methods. Through parallel processing, the provided tool can effectively analyze large-scale biological datasets with a markedly reduced computing time. The experimental results show that the proposed tool is notably fast, and is able to perform an all-pairs analysis of a large biological dataset using a normal computer. The source code and guidelines can be downloaded from https://github.com/HelloWorldCN/RapidMic.
An accurate region of interest extraction (ROI) plays an important role for both finger vein recognition systems and finger vein-based cryptography systems. In order to localize the rectangle ROI accurately, the edges of the finger and a line in the finger joint region should be detected accurately as a reference position. Because most of the existing finger edge detection methods do not work well, a robust finger edge detection method is proposed in this paper. An inner line of the finger is first detected to divide the finger vein image by two parts, after that two edge detection templates and a series of technologies such as interpolation, fit, etc. are used to detect and fix the wrong edges of the finger. Furthermore, considering that the shapes of the brighter finger joint region are irregular, multiple sliding windows including rectangle, disk, diamond and ellipse are generated, respectively to detect the reference line of the finger joint. Finally, a contour similarity distance-based method is introduced to evaluate the performance of various sliding windows. The experimental results show that the proposed edge detection method can 100% successfully detect the edges of the fingers in our finger vein image database. And for various detection windows, the ellipse window is more suitable for the detection of the finger joint reference line. So, the proposed ROI extraction method for finger vein images has a better overall performance compared with the other methods.
The sizes of the protein databases are growing rapidly nowadays thus clustering protein sequences based only on sequence information becomes increasingly important. In this paper, we analyze the limitation of Affinity propagation (AP) algorithm when clustering a dataset generated randomly. Then we propose a post-processing method to improve the AP algorithm. This method uses the median of the input similarities as the shared preference value, and then employs post-processing phase combined mergence and reassignment strategy on the results of the AP algorithm. We have tested our method extensively and compared its performance with other five methods on several datasets of COG (Clusters of Orthologous Groups of proteins) database, SCOP and G-protein family. The number of clusters obtained for a given set of proteins approximate to the correct number of clusters in that set. Moreover, in our experiments, the quality of the clusters as quantified by F-measure was better than that of others (on average, 9% better than BlastClust, 33% better than TribeMCL, 34% better than CLUSS, 59% better than Spectral clustering and 41% better than AP).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.