Tn5 transposase, which can efficiently tagment the genome, has been widely adopted as a molecular tool in next-generation sequencing, from short-read sequencing to more complex methods such as assay for transposase-accessible chromatin using sequencing (ATAC-seq). Here, we systematically map Tn5 insertion characteristics across several model organisms, finding critical parameters that affect its insertion. On naked genomic DNA, we found that Tn5 insertion is not uniformly distributed or random. To uncover drivers of these biases, we used a machine learning framework, which revealed that DNA shape cooperatively works with DNA motif to affect Tn5 insertion preference. These intrinsic insertion preferences can be modeled using nucleotide dependence information from DNA sequences, and we developed a computational pipeline to correct for these biases in ATAC-seq data. Using our pipeline, we show that bias correction improves the overall performance of ATAC-seq peak detection, recovering many potential false-negative peaks. Furthermore, we found that these peaks are bound by transcription factors, underscoring the biological relevance of capturing this additional information. These findings highlight the benefits of an improved understanding and precise correction of Tn5 insertion preference.
Acute myeloid leukemia (AML) refers to a heterogeneous group of hematopoietic malignancies. The well-known European Leukemia Network (ELN) stratifies AML patients into three risk groups, based primarily on the detection of cytogenetic abnormalities. However, the prognosis of cytogenetically normal AML (CN-AML), which is the largest AML subset, can be hard to define. Moreover, the clinical outcomes associated with this subgroup are diverse. In this study, using transcriptome profiles collected from CN-AML patients in the BeatAML cohort, we constructed a robust prognostic Cox model named NEST (Nine-gEne SignaTure). The validity of NEST was confirmed in four external independent cohorts. Moreover, the risk score predicted by the NEST model remained an independent prognostic factor in multivariate analyses. Further analysis revealed that the NEST model was suitable for bone marrow mononuclear cell (BMMC) samples but not peripheral blood mononuclear cell (PBMC) samples, which indirectly indicated subtle differences between BMMCs and PBMCs. Our data demonstrated the robustness and accuracy of the NEST model and implied the importance of the immune dysfunction in the leukemogenesis that occurs in CN-AML, which shed new light on the further exploration of molecular mechanisms and treatment guidance for CN-AML.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.