CNV_IFTV: An Isolation Forest and Total Variation-Based Detection of CNVs from Short-Read Sequencing Data

Yuan, Xiguo; Yu, Jun; Xi, Jianing; Yang, Liying; Shang, Junliang; Li, Zhe; Duan, Junbo

doi:10.1109/tcbb.2019.2920889

Cited by 39 publications

(53 citation statements)

References 71 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The design of the simulation experiments is described as follows: we adopt our previously developed simulation tool IntSIM (Yuan et al, 2017) to produce various datasets with varying tumor purity from 0.2 to 0.4 and varying sequencing coverage depth from 4× to 6× (Yuan et al, 2019b). In each simulation configuration, 50 replicated samples are generated for a sufficient test of our proposed method and the peer methods.…”

Section: Simulation Studiesmentioning

confidence: 99%

“…However, when facing with relatively low-coverage-depth data, the false-positive rate of CNVnator is not easy to control due to the influence from artifacts such as GC-content bias and uneven distribution of reads, although the CNVnator method has dealt with the GC bias in a reasonable way. Other popular RD-based methods include ReadDepth (Miller et al, 2011), XCAVATOR (Magi et al, 2017), Wavedec (Cai et al, 2018), seqCNV (Chen et al, 2017), iCopyDAV (Dharanipragada et al, 2018), GROM-RD (Smith et al, 2015), CONDEL (Yuan et al, 2018a), CLImAT (Yu et al, 2014), CNV_IFTV (Yuan et al, 2019b), m-HMM (Wang et al, 2014), DCC (Yuan et al, 2018c), CNV-seq (Xie and Tammi, 2009), and FREEC (Boeva et al, 2012). The characteristics of the existing methods are listed in Table 1.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

MFCNV: A New Method to Detect Copy Number Variations From Next-Generation Sequencing Data

Zhao

Huang

et al. 2020

Front. Genet.

Self Cite

View full text Add to dashboard Cite

Copy number variation (CNV) is a very important phenomenon in tumor genomes and plays a significant role in tumor genesis. Accurate detection of CNVs has become a routine and necessary procedure for a deep investigation of tumor cells and diagnosis of tumor patients. Next-generation sequencing (NGS) technique has provided a wealth of data for the detection of CNVs at base-pair resolution. However, such task is usually influenced by a number of factors, including GC-content bias, sequencing errors, and correlations among adjacent positions within CNVs. Although many existing methods have dealt with some of these artifacts by designing their own strategies, there is still a lack of comprehensive consideration of all the factors. In this paper, we propose a new method, MFCNV, for an accurate detection of CNVs from NGS data. Compared with existing methods, the characteristics of the proposed method include the following: (1) it makes a full consideration of the intrinsic correlations among adjacent positions in the genome to be analyzed, (2) it calculates read depth, GC-content bias, base quality, and correlation value for each genome bin and combines them as multiple features for the evaluation of genome bins, and (3) it addresses the joint effect among the factors via training a neural network algorithm for the prediction of CNVs. We test the performance of the MFCNV method by using simulation and real sequencing data and make comparisons with several peer methods. The results demonstrate that our method is superior to other methods in terms of sensitivity, precision, and F1-score and can detect many CNVs that other methods have not discovered. MFCNV is expected to be a complementary tool in the analysis of mutations in tumor genomes and can be extended to be applied to the analysis of single-cell sequencing data.

show abstract

Section: Simulation Studiesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

MFCNV: A New Method to Detect Copy Number Variations From Next-Generation Sequencing Data

Zhao

Huang

et al. 2020

Front. Genet.

Self Cite

View full text Add to dashboard Cite

show abstract

“…In any version of a reference genome, there are a large number of N values in genome positions (Yuan et al, 2019). The value of N means that the base has not been determined yet in the construction of the reference genome.…”

Section: Processing Of N Positionsmentioning

confidence: 99%

“…The symbol λ represents the penalty parameter that controls the trade-off between the first term (which can be called fitting error) and the second term (which can be called the total variation penalty). It is difficult to determine the value of λ (Condat, 2013;Duan et al, 2013;Yuan et al, 2019). When it tends to zero, the effect of the penalty term is minimal, and a is equal to b.…”

Section: Denoising Using Tv (Total Variation)mentioning

confidence: 99%

DINTD: Detection and Inference of Tandem Duplications From Short Sequencing Reads

Dong

Wang

et al. 2020

Front. Genet.

Self Cite

View full text Add to dashboard Cite

“…AITAC is written and implemented in Python. To make a complete analysis pipeline from sequencing data to a report on tumor purity and absolute copy numbers, we incorporate our previously developed CNV detection method, CNV_IFTV (Yuan et al, 2019b), into the AITAC algorithm. The source code of AITAC is available at https://github.com/BDanalysis/aitac and can be downloaded freely.…”

Section: Introductionmentioning

confidence: 99%

Accurate Inference of Tumor Purity and Absolute Copy Numbers From High-Throughput Sequencing Data

Yuan

Zhao

et al. 2020

Front. Genet.

Self Cite

View full text Add to dashboard Cite

Inference of absolute copy numbers in tumor genomes is one of the key points in the study of tumor genesis. However, the mixture of tumor and normal cells poses a big challenge to this task. Accurate estimation of tumor purity (i.e., the fraction of tumor cells) is a necessary step to solve this problem. In this paper, we propose a new approach, AITAC, to accurately infer tumor purity and absolute copy numbers in a tumor sample by using high-throughput sequencing (HTS) data. In contrast to many existing algorithms for estimating tumor purity, which usually rely on pre-detected mutation genotypes (heterogeneity and homogeneity), AITAC just requires read depths (RDs) observed at the regions with copy number losses. AITAC creates a non-linear model to correlate tumor purity, observed and expected RDs. It adopts an exhaustive search strategy to scan tumor purity in a wide range, and chooses the tumor purity that minimizes the deviation between observed RDs and expected ones as the optimal solution. We apply the proposed approach to both simulation and real sequencing data sets and demonstrate its performance by comparing with two classical approaches. AITAC is freely available at https://github.com/BDanalysis/aitac and can be expected to become a useful approach for researchers to analyze copy numbers in cancer genome.

show abstract

CNV_IFTV: An Isolation Forest and Total Variation-Based Detection of CNVs from Short-Read Sequencing Data

Cited by 39 publications

References 71 publications

MFCNV: A New Method to Detect Copy Number Variations From Next-Generation Sequencing Data

MFCNV: A New Method to Detect Copy Number Variations From Next-Generation Sequencing Data

DINTD: Detection and Inference of Tandem Duplications From Short Sequencing Reads

Accurate Inference of Tumor Purity and Absolute Copy Numbers From High-Throughput Sequencing Data

Contact Info

Product

Resources

About