Copy number variation (CNV) is a class of key biomarkers in many complex traits and diseases. Detecting CNV from sequencing data is a substantial bioinformatics problem and a standard requirement in clinical practice. Although many proposed CNV detection approaches exist, the core statistical model at their foundation is weakened by two critical computational issues: (i) identifying the optimal setting on the sliding window and (ii) correcting for bias and noise. We designed a statistical process model to overcome these limitations by calculating regional read depths via an exponentially weighted moving average strategy. A one-run detection of CNVs of various lengths is then achieved by a dynamic sliding window, whose size is self-adopted according to the weighted averages. We also designed a novel bias/noise reduction model, accompanied by the moving average, which can handle complicated patterns and extend training data. This model, called PEcnv, accurately detects CNVs ranging from kb-scale to chromosome-arm level. The model performance was validated with simulation samples and real samples. Comparative analysis showed that PEcnv outperforms current popular approaches. Notably, PEcnv provided considerable advantages in detecting small CNVs (1 kb–1 Mb) in panel sequencing data. Thus, PEcnv fills the gap left by existing methods focusing on large CNVs. PEcnv may have broad applications in clinical testing where panel sequencing is the dominant strategy. Availability and implementation: Source code is freely available at https://github.com/Sherwin-xjtu/PEcnv
Delins, as known as complex indel, is a combined genomic structural variation formed by deleting and inserting DNA fragments at a common genomic location. Recent studies emphasized the importance of delins in cancer diagnosis and treatment. Although the long reads from PacBio CLR sequencing significantly facilitate delins calling, the existing approaches still encounter computational challenges from the high level of sequencing errors, and often introduce errors in genotyping and phasing delins. In this paper, we propose an efficient algorithmic pipeline, named delInsCaller, to identify delins on haplotype resolution from the PacBio CLR sequencing data. delInsCaller design a fault-tolerant method by calculating a variation density score, which helps to locate the candidate mutational regions under a high-level of sequencing errors. It adopts a base association-based contig splicing method, which facilitates contig splicing in the presence of false-positive interference. We conducted a series of experiments on simulated datasets, and the results showed that delInsCaller outperformed several state-of-the-art approaches, e.g., SVseq3, across a wide range of parameter settings, such as read depth, sequencing error rates, etc. delInsCaller often obtained higher f-measures than other approaches; specifically, it was able to maintain advantages at ~15% sequencing errors. delInsCaller was able to significantly improve the N50 values with almost no loss of haplotype accuracy compared with the existing approach as well.
Duplex sequencing technology has been widely used in the detection of low-frequency mutations in circulating tumor deoxyribonucleic acid (DNA), but how to determine the sequencing depth and other experimental parameters to ensure the stable detection of low-frequency mutations is still an urgent problem to be solved. The mutation detection rules of duplex sequencing constrain not only the number of mutated templates but also the number of mutation-supportive reads corresponding to each forward and reverse strand of the mutated templates. To tackle this problem, we proposed a Depth Estimation model for stable detection of Low-Frequency MUTations in duplex sequencing (DELFMUT), which models the identity correspondence and quantitative relationships between templates and reads using the zero-truncated negative binomial distribution without considering the sequences composed of bases. The results of DELFMUT were verified by real duplex sequencing data. In the case of known mutation frequency and mutation detection rule, DELFMUT can recommend the combinations of DNA input and sequencing depth to guarantee the stable detection of mutations, and it has a great application value in guiding the experimental parameter setting of duplex sequencing technology.
The formation and development of potato tissues and organs is a complex process regulated by a variety of genes and environmental factors. However, the regulatory mechanisms underlying the growth and development are still unclear. In this study, we used autotetraploid potato JC14 as experimental subject to analyze the transcriptome of root, stem and leaf at seedling, tuber formation and tuber expansion stages to explore the spatio-temporal expression pattern of genes and genetic development characteristics. The results identified thousands of differentially expressed genes and KEGG pathway enrichment analysis showed that these genes were mainly involved in defense response and carbohydrate metabolism pathways. A total of 12 co-expressed Gene modules were identified by Weighted Gene Co-expression Network Analysis (WGCNA), and 4 modules were screened out with the highest correlation with potato stem developmental traits. Core genes in the network were further investigated and functionally annotated by computing the connectivity of genes within the module. The results unveiled number of hub genes in stems at different developmental stages, including carbohydrate metabolism related genes, the defense response related genes, and transcription factors. These findings provide important leads for further understanding of the molecular regulation and genetic mechanisms of potato tissue development.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.