Copy number aberrations (CNAs), which are pathogenic copy number variations (CNVs), play an important role in the initiation and progression of cancer. Single-cell DNA-sequencing (scDNAseq) technologies produce data that is ideal for inferring CNAs. In this review, we review eight methods that have been developed for detecting CNAs in scDNAseq data, and categorize them according to the steps of a seven-step pipeline that they employ. Furthermore, we review models and methods for evolutionary analyses of CNAs from scDNAseq data and highlight advances and future research directions for computational methods for CNA detection from scDNAseq data.
Deep Learning (DL) has recently enabled unprecedented advances in one of the grand challenges in computational biology: the half-century-old problem of protein structure prediction. In this paper we discuss recent advances, limitations, and future perspectives of DL on five broad areas: protein structure prediction, protein function prediction, genome engineering, systems biology and data integration, and phylogenetic inference. We discuss each application area and cover the main bottlenecks of DL approaches, such as training data, problem scope, and the ability to leverage existing DL architectures in new contexts. To conclude, we provide a summary of the subject-specific and general challenges for DL across the biosciences.
Single-cell DNA sequencing technologies are enabling the study of mutations and their evolutionary trajectories in cancer. Somatic copy number aberrations (CNAs) have been implicated in the development and progression of various types of cancer. A wide array of methods for CNA detection has been either developed specifically for or adapted to singlecell DNA sequencing data. Understanding the strengths and limitations that are unique to each of these methods is very important for obtaining accurate copy number profiles from single-cell DNA sequencing data. We benchmarked three widely used methods-Ginkgo, HMMcopy, and CopyNumber-on simulated as well as real datasets. To facilitate this, we developed a novel simulator of single-cell genome evolution in the presence of CNAs. Furthermore, to assess performance on empirical data where the ground truth is unknown, we introduce a phylogeny-based measure for identifying potentially erroneous inferences. While single-cell DNA sequencing is very promising for elucidating and understanding CNAs, our findings show that even the best existing method does not exceed 80% accuracy. New methods that significantly improve upon the accuracy of these three methods are needed. Furthermore, with the large datasets being generated, the methods must be computationally efficient.
Intelligence is the ability to learn from experience and use domain experts’ knowledge to adapt to new situations. In this context, an intelligent Recommender System should be able to learn from domain experts’ knowledge and experience, as it is vital to know the domain that the items will be recommended. Traditionally, Recommender Systems have been recognized as playlist generators for video/music services (e.g., Netflix and Spotify), e-commerce product recommenders (e.g., Amazon and eBay), or social content recommenders (e.g., Facebook and Twitter). However, Recommender Systems in modern enterprises are highly data-/knowledge-driven and may rely on users’ cognitive aspects such as personality, behavior, and attitude. In this paper, we survey and summarize previously published studies on Recommender Systems to help readers understand our method’s contributions to the field in this context. We discuss the current limitations of the state of the art approaches in Recommender Systems and the need for our new approach: A vision and a general framework for a new type of data-driven, knowledge-driven, and cognition-driven Recommender Systems, namely, Cognitive Recommender Systems. Cognitive Recommender Systems will be the new type of intelligent Recommender Systems that understand the user’s preferences, detect changes in user preferences over time, predict user’s unknown favorites, and explore adaptive mechanisms to enable intelligent actions within the compound and changing environments. We present a motivating scenario in banking and argue that existing Recommender Systems: (i) do not use domain experts’ knowledge to adapt to new situations; (ii) may not be able to predict the ratings or preferences a customer would give to a product (e.g., loan, deposit, or trust service); and (iii) do not support data capture and analytics around customers’ cognitive activities and use it to provide intelligent and time-aware recommendations.
Copy number aberrations (CNAs) are ubiquitous in many types of cancer. Inferring CNAs from cancer genomic data could help shed light on the initiation, progression, and potential treatment of cancer. While such data have traditionally been available via ``bulk sequencing", the more recently introduced techniques for single-cell DNA sequencing (scDNAseq) provide the type of data that makes CNA inference possible at the single-cell resolution. In this paper, we introduce a new birth-death evolutionary model of CNAs as well as a Bayesian method, NestedBD, for the inference of evolutionary trees (topologies and branch lengths with relative mutation rates) from single-cell data under this model. We assessed the accuracy of our method on both simulated and biological data and compared it to the accuracy of two standard phylogenetic tools, namely neighbor-joining and maximum parsimony (MP). We show through simulations that our method infers more accurate topologies and branch lengths. We also studied the ancestral state reconstruction accuracy with the birth-death evolutionary model and found it outperformed MP. Finally, running all three methods on a colorectal cancer data set, we observed that among all three methods, only the phylogeny inferred by NestedBD clearly separated the primary tumor cells from the metastatic ones, providing a more plausible history of the tumor cells.
Background: Current development of sequencing technologies is towards generating longer and noisier reads. Evidently, accurate alignment of these reads play an important role in any downstream analysis. Similarly, reducing the overall cost of sequencing is related to the time consumption of the aligner. The tradeoff between accuracy and speed is the main challenge in designing long read aligners. Results: We propose Meta-aligner which aligns long and very long reads to the reference genome very efficiently and accurately. Meta-aligner incorporates available short/long aligners as subcomponents and uses statistics from the reference genome to increase the performance. Meta-aligner estimates statistics from reads and the reference genome automatically. Meta-aligner is implemented in C++ and runs in popular POSIX-like operating systems such as Linux. Conclusions: Meta-aligner achieves high recall rates and precisions especially for long reads and high error rates. Also, it improves performance of alignment in the case of PacBio long-reads in comparison with traditional schemes.
Single-cell sequencing provides a powerful approach for elucidating intratumor heterogeneity by resolving cell-to-cell variability. However, it also poses additional challenges including elevated error rates, allelic dropout and non-uniform coverage. A recently introduced singlecell-specific mutation detection algorithm leverages the evolutionary relationship between cells for denoising the data. However, due to its probabilistic nature, this method does not scale well with the number of cells. Here, we develop a novel combinatorial approach for utilizing the genealogical relationship of cells in detecting mutations from noisy single-cell sequencing data. Our method, called scVILP, jointly detects mutations in individual cells and reconstructs a perfect phylogeny among these cells. We employ a novel Integer Linear Program algorithm for deterministically and efficiently solving the joint inference problem. We show that scVILP achieves similar or better accuracy but significantly better runtime over existing methods on simulated data. We also applied scVILP to an empirical human cancer dataset from a high grade serous ovarian cancer patient.
Motivation Single-nucleotide variants (SNVs) are the most common variations in the human genome. Recently developed methods for SNV detection from single-cell DNA sequencing data, such as SCIΦ and scVILP, leverage the evolutionary history of the cells to overcome the technical errors associated with single-cell sequencing protocols. Despite being accurate, these methods are not scalable to the extensive genomic breadth of single-cell whole-genome (scWGS) and whole-exome sequencing (scWES) data. Results Here, we report on a new scalable method, Phylovar, which extends the phylogeny-guided variant calling approach to sequencing datasets containing millions of loci. Through benchmarking on simulated datasets under different settings, we show that, Phylovar outperforms SCIΦ in terms of running time while being more accurate than Monovar (which is not phylogeny-aware) in terms of SNV detection. Furthermore, we applied Phylovar to two real biological datasets: an scWES triple-negative breast cancer data consisting of 32 cells and 3375 loci as well as an scWGS data of neuron cells from a normal human brain containing 16 cells and approximately 2.5 million loci. For the cancer data, Phylovar detected somatic SNVs with high or moderate functional impact that were also supported by bulk sequencing dataset and for the neuron dataset, Phylovar identified 5745 SNVs with non-synonymous effects some of which were associated with neurodegenerative diseases. Availability and implementation Phylovar is implemented in Python and is publicly available at https://github.com/NakhlehLab/Phylovar.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.