We used data sets consisting of miRNA-target gene binding information and expression profiles of miRNAs and mRNAs on human cancer samples. Our method allowed us to detect functionally correlated miRNA-mRNA modules involved in specific biological processes from multiple data sources by using a balanced fitness function and efficient searching over multiple populations. The proposed algorithm found two miRNA-mRNA modules, highly correlated with respect to their expression and biological function. Moreover, the mRNAs included in the same module showed much higher correlations when the related miRNAs were highly expressed, demonstrating our method's ability for finding coherent miRNA-mRNA modules. Most members of these modules have been reported to be closely related with cancer. Consequently, our method can provide a primary source of miRNA and target sets presumed to constitute closely related parts of gene regulatory pathways.
Comprehensive and accurate detection of variants from whole-genome sequencing (WGS) is a strong prerequisite for translational genomic medicine; however, low concordance between analytic pipelines is an outstanding challenge. We processed a European and an African WGS samples with 70 analytic pipelines comprising the combination of 7 short-read aligners and 10 variant calling algorithms (VCAs), and observed remarkable differences in the number of variants called by different pipelines (max/min ratio: 1.3~3.4). The similarity between variant call sets was more closely determined by VCAs rather than by short-read aligners. Remarkably, reported minor allele frequency had a substantial effect on concordance between pipelines (concordance rate ratio: 0.11~0.92; Wald tests,
P
< 0.001), entailing more discordant results for rare and novel variants. We compared the performance of analytic pipelines and pipeline ensembles using gold-standard variant call sets and the catalog of variants from the 1000 Genomes Project. Notably, a single pipeline using BWA-MEM and GATK-HaplotypeCaller performed comparable to the pipeline ensembles for ‘callable’ regions (~97%) of the human reference genome. While a single pipeline is capable of analyzing common variants in most genomic regions, our findings demonstrated the limitations and challenges in analyzing rare or novel variants, especially for non-European genomes.
Classification of patient samples is a crucial aspect of cancer diagnosis. DNA hybridization arrays simultaneously measure the expression levels of thousands of genes and it has been suggested that gene expression may provide the additional information needed to improve cancer classification and diagnosis. This paper presents methods for analyzing gene expression data to classify cancer types. Machine learning techniques, such as Bayesian networks, neural trees, and radial basis function (RBF) networks, are used for the analysis of the CAMDA Data Set 2. These techniques have their own properties including the ability of finding important genes for cancer classification, revealing relationships among genes, and classifying cancer. This paper reports on comparative evaluation of the experimental results of these methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.