Computed tomography (CT) examinations are commonly used to predict lung nodule malignancy in patients, which are shown to improve noninvasive early diagnosis of lung cancer. It remains challenging for computational approaches to achieve performance comparable to experienced radiologists. Here we present NoduleX, a systematic approach to predict lung nodule malignancy from CT data, based on deep learning convolutional neural networks (CNN). For training and validation, we analyze >1000 lung nodules in images from the LIDC/IDRI cohort. All nodules were identified and classified by four experienced thoracic radiologists who participated in the LIDC project. NoduleX achieves high accuracy for nodule malignancy classification, with an AUC of ~0.99. This is commensurate with the analysis of the dataset by experienced radiologists. Our approach, NoduleX, provides an effective framework for highly accurate nodule malignancy prediction with the model trained on a large patient population. Our results are replicable with software available at http://bioinformatics.astate.edu/NoduleX.
Next-generation sequencing is empowering genetic disease research. However, it also brings significant challenges for efficient and effective sequencing data analysis. We built a pipeline, called DNAp, for analyzing whole exome sequencing (WES) and whole genome sequencing (WGS) data, to detect mutations from disease samples. The pipeline is containerized, convenient to use and can run under any system, since it is a fully automatic process in Docker container form. It is also open, and can be easily customized with user intervention points, such as for updating reference files and different software or versions. The pipeline has been tested with both human and mouse sequencing datasets, and it has generated mutations results, comparable to published results from these datasets, and reproducible across heterogeneous hardware platforms. The pipeline DNAp, funded by the US Food and Drug Administration (FDA), was developed for analyzing DNA sequencing data of FDA. Here we make DNAp an open source, with the software and documentation available to the public at http://bioinformatics.astate.edu/dna-pipeline/.
2 Introduction: Minor QTLs mining has a very important role in genomic selection, pathway analysis and 3 trait development in agricultural and biological research. Since most individual loci contribute little to 4 complex trait variations, it remains a challenge for traditional statistical methods to identify minor QTLs 5 with subtle phenotypic effects. Here we applied a new framework which combined the GWAS analysis 6 and machine learning feature selection to explore new ways for the study of minor QTLs mining. 7 Results: We studied the soybean branching trait with the 2,137 accessions from soybean (Glycine max) 8 diversity panel, which was sequenced by 50k SNP chips with 42,080 valid SNPs. First as a baseline 9 study, we conducted the GWAS GAPIT analysis, and we found that only one SNP marker significantly 10 associated with soybean branching was identified. We then combined the GWAS analysis and feature 11 importance analysis with Random Forest score analysis and permutation analysis. Our analysis results 12 showed that there are 36,077 features (SNPs) identified by Random Forest score analysis, and 2,098 13 features (SNPs) identified by permutation analysis. In total, there are 1,770 features (SNPs) confirmed by 14 both of the Random Forest score analysis and the permutation analysis. Based on our analysis, 328 15 branching development related genes were identified. A further analysis on GO (gene ontology) term16 enrichment were applied on these 328 genes. And the gene location and gene expression of these 17 identified genes were provided. 18Conclusions: We find that the combined analysis with GWAS and machine learning feature selection 19 shows significant identification power for minor QTLs mining. The presented research results on minor 20 QTLs mining will help understand the biological activities that lie between genotype and phenotype in 21 terms of causal networks of interacting genes. This study will potentially contribute to effective genomic 22 selection in plant breeding and help broaden the way of molecular breeding in plants. 23
Genome-wide association studies present computational challenges for missing data imputation, while the advances of genotype technologies are generating datasets of large sample sizes with sample sets genotyped on multiple SNP chips. We present a new framework SparRec (Sparse Recovery) for imputation, with the following properties: (1) The optimization models of SparRec, based on low-rank and low number of co-clusters of matrices, are different from current statistics methods. While our low-rank matrix completion (LRMC) model is similar to Mendel-Impute, our matrix co-clustering factorization (MCCF) model is completely new. (2) SparRec, as other matrix completion methods, is flexible to be applied to missing data imputation for large meta-analysis with different cohorts genotyped on different sets of SNPs, even when there is no reference panel. This kind of meta-analysis is very challenging for current statistics based methods. (3) SparRec has consistent performance and achieves high recovery accuracy even when the missing data rate is as high as 90%. Compared with Mendel-Impute, our low-rank based method achieves similar accuracy and efficiency, while the co-clustering based method has advantages in running time. The testing results show that SparRec has significant advantages and competitive performance over other state-of-the-art existing statistics methods including Beagle and fastPhase.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.