Background: The DNA microarray technology allows the measurement of expression levels of thousands of genes under tens/hundreds of different conditions. In microarray data, genes with similar functions usually co-express under certain conditions only [1]. Thus, biclustering which clusters genes and conditions simultaneously is preferred over the traditional clustering technique in discovering these coherent genes. Various biclustering algorithms have been developed using different bicluster formulations. Unfortunately, many useful formulations result in NP-complete problems. In this article, we investigate an efficient method for identifying a popular type of biclusters called additive model. Furthermore, parallel coordinate (PC) plots are used for bicluster visualization and analysis.
DNA microarray experiment inevitably generates gene expression data with missing values. An important and necessary pre-processing step is thus to impute these missing values. Existing imputation methods exploit gene correlation among all experimental conditions for estimating the missing values. However, related genes coexpress in subsets of experimental conditions only. In this paper, we propose to use biclusters which contain similar genes under subset of conditions for characterizing the gene similarity and then estimating the missing values. To further improve the accuracy in missing value estimation, an iterative framework is developed with a stopping criterion on minimizing uncertainty. Extensive experiments have been conducted on artificial datasets, real microarray datasets as well as one non-microarray dataset. Our proposed biclusters-based approach This is the Pre-Published Version.2 is able to reduce errors in missing value estimation.
The multiscale directional filter bank (MDFB) improves the radial frequency resolution of the contourlet transform by introducing an additional decomposition in the high-frequency band. The increase in frequency resolution is particularly useful for texture description because of the quasi-periodic property of textures. However, the MDFB needs an extra set of scale and directional decomposition, which is performed on the full image size. The rise in computational complexity is, thus, prominent. In this paper, we develop an efficient implementation framework for the MDFB. In the new framework, directional decomposition on the first two scales is performed prior to the scale decomposition. This allows sharing of directional decomposition among the two scales and, hence, reduces the computational complexity significantly. Based on this framework, two fast implementations of the MDFB are proposed. The first one can maintain the same flexibility in directional selectivity in the first two scales while the other has the same redundancy ratio as the contourlet transform. Experimental results show that the first and the second schemes can reduce the computational time by 33.3%-34.6% and 37.1%-37.5%, respectively, compared to the original MDFB algorithm. Meanwhile, the texture retrieval performance of the proposed algorithms is more or less the same as the original MDFB approach which outperforms the steerable pyramid and the contourlet transform approaches.
Traditionally, intra-sequence similarity is exploited for compressing a single DNA sequence. Recently, remarkable compression performance of individual DNA sequence from the same population is achieved by encoding its difference with a nearly identical reference sequence. Nevertheless, there is lack of general algorithms that also allow less similar reference sequences. In this work, we extend the intra-sequence to the inter-sequence similarity in that approximate matches of subsequences are found between the DNA sequence and a set of reference sequences. Hence, a set of nearly identical DNA sequences from the same population or a set of partially similar DNA sequences like chromosome sequences and DNA sequences of related species can be compressed together. For practical compressors, the compressed size is usually influenced by the compression order of sequences. Fast search algorithms for the optimal compression order are thus developed for multiple sequences compression. Experimental results on artificial and real datasets demonstrate that our proposed multiple sequences compression methods with fast compression order search are able to achieve good compression performance under different levels of similarity in the multiple DNA sequences.
DNA microarray data always contains missing values. As subsequent analysis such as biclustering can only be applied on complete data, these missing values have to be imputed before any biclusters can be detected. Existing imputation methods exploit coherence among expression values in the microarray data. In view that biclustering attempts to find correlated expression values within the data, we propose to combine the missing value imputation and biclustering into a single framework in which the two processes are performed iteratively. In this way, the missing value imputation can improve bicluster analysis and the coherence in detected biclusters can be exploited for better missing value estimation. Experiments have been conducted on artificial datasets and real datasets to verify the effectiveness of the proposed algorithm in reducing estimation errors of missing values.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.