“…The design of the simulation experiments is described as follows: we adopt our previously developed simulation tool IntSIM (Yuan et al, 2017) to produce various datasets with varying tumor purity from 0.2 to 0.4 and varying sequencing coverage depth from 4× to 6× (Yuan et al, 2019b). In each simulation configuration, 50 replicated samples are generated for a sufficient test of our proposed method and the peer methods.…”
Section: Simulation Studiesmentioning
confidence: 99%
“…However, when facing with relatively low-coverage-depth data, the false-positive rate of CNVnator is not easy to control due to the influence from artifacts such as GC-content bias and uneven distribution of reads, although the CNVnator method has dealt with the GC bias in a reasonable way. Other popular RD-based methods include ReadDepth (Miller et al, 2011), XCAVATOR (Magi et al, 2017), Wavedec (Cai et al, 2018), seqCNV (Chen et al, 2017), iCopyDAV (Dharanipragada et al, 2018), GROM-RD (Smith et al, 2015), CONDEL (Yuan et al, 2018a), CLImAT (Yu et al, 2014), CNV_IFTV (Yuan et al, 2019b), m-HMM (Wang et al, 2014), DCC (Yuan et al, 2018c), CNV-seq (Xie and Tammi, 2009), and FREEC (Boeva et al, 2012). The characteristics of the existing methods are listed in Table 1.…”
Copy number variation (CNV) is a very important phenomenon in tumor genomes and plays a significant role in tumor genesis. Accurate detection of CNVs has become a routine and necessary procedure for a deep investigation of tumor cells and diagnosis of tumor patients. Next-generation sequencing (NGS) technique has provided a wealth of data for the detection of CNVs at base-pair resolution. However, such task is usually influenced by a number of factors, including GC-content bias, sequencing errors, and correlations among adjacent positions within CNVs. Although many existing methods have dealt with some of these artifacts by designing their own strategies, there is still a lack of comprehensive consideration of all the factors. In this paper, we propose a new method, MFCNV, for an accurate detection of CNVs from NGS data. Compared with existing methods, the characteristics of the proposed method include the following: (1) it makes a full consideration of the intrinsic correlations among adjacent positions in the genome to be analyzed, (2) it calculates read depth, GC-content bias, base quality, and correlation value for each genome bin and combines them as multiple features for the evaluation of genome bins, and (3) it addresses the joint effect among the factors via training a neural network algorithm for the prediction of CNVs. We test the performance of the MFCNV method by using simulation and real sequencing data and make comparisons with several peer methods. The results demonstrate that our method is superior to other methods in terms of sensitivity, precision, and F1-score and can detect many CNVs that other methods have not discovered. MFCNV is expected to be a complementary tool in the analysis of mutations in tumor genomes and can be extended to be applied to the analysis of single-cell sequencing data.
“…The design of the simulation experiments is described as follows: we adopt our previously developed simulation tool IntSIM (Yuan et al, 2017) to produce various datasets with varying tumor purity from 0.2 to 0.4 and varying sequencing coverage depth from 4× to 6× (Yuan et al, 2019b). In each simulation configuration, 50 replicated samples are generated for a sufficient test of our proposed method and the peer methods.…”
Section: Simulation Studiesmentioning
confidence: 99%
“…However, when facing with relatively low-coverage-depth data, the false-positive rate of CNVnator is not easy to control due to the influence from artifacts such as GC-content bias and uneven distribution of reads, although the CNVnator method has dealt with the GC bias in a reasonable way. Other popular RD-based methods include ReadDepth (Miller et al, 2011), XCAVATOR (Magi et al, 2017), Wavedec (Cai et al, 2018), seqCNV (Chen et al, 2017), iCopyDAV (Dharanipragada et al, 2018), GROM-RD (Smith et al, 2015), CONDEL (Yuan et al, 2018a), CLImAT (Yu et al, 2014), CNV_IFTV (Yuan et al, 2019b), m-HMM (Wang et al, 2014), DCC (Yuan et al, 2018c), CNV-seq (Xie and Tammi, 2009), and FREEC (Boeva et al, 2012). The characteristics of the existing methods are listed in Table 1.…”
Copy number variation (CNV) is a very important phenomenon in tumor genomes and plays a significant role in tumor genesis. Accurate detection of CNVs has become a routine and necessary procedure for a deep investigation of tumor cells and diagnosis of tumor patients. Next-generation sequencing (NGS) technique has provided a wealth of data for the detection of CNVs at base-pair resolution. However, such task is usually influenced by a number of factors, including GC-content bias, sequencing errors, and correlations among adjacent positions within CNVs. Although many existing methods have dealt with some of these artifacts by designing their own strategies, there is still a lack of comprehensive consideration of all the factors. In this paper, we propose a new method, MFCNV, for an accurate detection of CNVs from NGS data. Compared with existing methods, the characteristics of the proposed method include the following: (1) it makes a full consideration of the intrinsic correlations among adjacent positions in the genome to be analyzed, (2) it calculates read depth, GC-content bias, base quality, and correlation value for each genome bin and combines them as multiple features for the evaluation of genome bins, and (3) it addresses the joint effect among the factors via training a neural network algorithm for the prediction of CNVs. We test the performance of the MFCNV method by using simulation and real sequencing data and make comparisons with several peer methods. The results demonstrate that our method is superior to other methods in terms of sensitivity, precision, and F1-score and can detect many CNVs that other methods have not discovered. MFCNV is expected to be a complementary tool in the analysis of mutations in tumor genomes and can be extended to be applied to the analysis of single-cell sequencing data.
“…In any version of a reference genome, there are a large number of N values in genome positions (Yuan et al, 2019). The value of N means that the base has not been determined yet in the construction of the reference genome.…”
Section: Processing Of N Positionsmentioning
confidence: 99%
“…The symbol λ represents the penalty parameter that controls the trade-off between the first term (which can be called fitting error) and the second term (which can be called the total variation penalty). It is difficult to determine the value of λ (Condat, 2013;Duan et al, 2013;Yuan et al, 2019). When it tends to zero, the effect of the penalty term is minimal, and a is equal to b.…”
Section: Denoising Using Tv (Total Variation)mentioning
“…AITAC is written and implemented in Python. To make a complete analysis pipeline from sequencing data to a report on tumor purity and absolute copy numbers, we incorporate our previously developed CNV detection method, CNV_IFTV (Yuan et al, 2019b), into the AITAC algorithm. The source code of AITAC is available at https://github.com/BDanalysis/aitac and can be downloaded freely.…”
Inference of absolute copy numbers in tumor genomes is one of the key points in the study of tumor genesis. However, the mixture of tumor and normal cells poses a big challenge to this task. Accurate estimation of tumor purity (i.e., the fraction of tumor cells) is a necessary step to solve this problem. In this paper, we propose a new approach, AITAC, to accurately infer tumor purity and absolute copy numbers in a tumor sample by using high-throughput sequencing (HTS) data. In contrast to many existing algorithms for estimating tumor purity, which usually rely on pre-detected mutation genotypes (heterogeneity and homogeneity), AITAC just requires read depths (RDs) observed at the regions with copy number losses. AITAC creates a non-linear model to correlate tumor purity, observed and expected RDs. It adopts an exhaustive search strategy to scan tumor purity in a wide range, and chooses the tumor purity that minimizes the deviation between observed RDs and expected ones as the optimal solution. We apply the proposed approach to both simulation and real sequencing data sets and demonstrate its performance by comparing with two classical approaches. AITAC is freely available at https://github.com/BDanalysis/aitac and can be expected to become a useful approach for researchers to analyze copy numbers in cancer genome.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.