The recent boom in microfluidics and combinatorial indexing strategies, combined with low sequencing costs, has empowered single-cell sequencing technology. Thousands-or even millions-of cells analyzed in a single experiment amount to a data revolution in single-cell biology and pose unique data science problems. Here, we outline eleven challenges that will be central to bringing this emerging field of single-cell data science forward. For each challenge, we highlight motivating research questions, review prior work, and formulate open problems. This compendium is for established researchers, newcomers, and students alike, highlighting interesting and rewarding problems for the coming years.
Hepatitis C is a major public health problem in the United States and worldwide. Outbreaks of hepatitis C virus (HCV) infections are associated with unsafe injection practices, drug diversion, and other exposures to blood, being difficult to detect and investigate. Here, we developed and validated a simple approach for molecular detection of HCV transmissions in outbreak settings. We obtained sequences from the HCV hypervariable region 1 (HVR1) using End-Point Limiting-Dilution (EPLD) from 127 cases involved in 32 epidemiologically defined HCV outbreaks and 193 individuals with unrelated HCV strains. We compared several types of genetic distances and calculated a threshold using minimal Hamming distances that identifies transmission clusters in all tested outbreaks with 100% accuracy. The approach was also validated on sequences from 239 individuals obtained using next-generation sequencing, showing the same accuracy as EPLD. In average, nucleotide diversity of the intra-host population was 6.2-times greater in the source than in any incident case, allowing the correct detection of transmission direction in 8 outbreaks for which source cases were known. A simple and accurate distance-based approach for detecting HCV transmissions developed here streamlines molecular investigation of outbreaks, thus improving the public health capacity for rapid and effective control of hepatitis C.
Hepatitis C virus (HCV) has the propensity to cause chronic infection. Continuous immune escape has been proposed as a mechanism of intrahost viral evolution contributing to HCV persistence. Although the pronounced genetic diversity of intrahost HCV populations supports this hypothesis, recent observations of long-term persistence of individual HCV variants, negative selection increase, and complex dynamics of viral subpopulations during infection as well as broad cross-immunoreactivity (CR) among variants are inconsistent with the immune-escape hypothesis. Here, we present a mathematical model of intrahost viral population dynamics under the condition of a complex CR network (CRN) of viral variants and examine the contribution of CR to establishing persistent HCV infection. The model suggests a mechanism of viral adaptation by antigenic cooperation (AC), with immune responses against one variant protecting other variants. AC reduces the capacity of the host's immune system to neutralize certain viral variants. CRN structure determines specific roles for each viral variant in host adaptation, with variants eliciting broad-CR antibodies facilitating persistence of other variants immunoreacting with these antibodies. The proposed mechanism is supported by empirical observations of intrahost HCV evolution. Interference with AC is a potential strategy for interruption and prevention of chronic HCV infection.hepatitis C | viral quasispecies | cross-immunoreactivity | complex network | intrahost adaptation H epatitis C virus (HCV) causes chronic infection in ∼ 70% of infected people, who become at risk for developing severe liver diseases (1). The virus establishes chronic infection by using several molecular mechanisms for averting innate immunity and attenuating effectiveness of adaptive immune responses (2). HCV is one of the most heterogeneous viruses infecting humans and exists in each infected host as a population of genetically related variants (3, 4). Substantial heterogeneity and drastic changes in genetic composition of the intrahost HCV population observed during chronic infections have been interpreted as evidence of a continuous immune escape via random mutations, thereby generating increasing genetic diversity of viral populations in infected individuals (5-7).The observed cross-immunoreactivity (CR) of HCV variants from earlier stages of infection with antibodies (Abs) from later stages and ineffectiveness of Abs to immunoreact with variants from the same stage of infection (7) seemingly support the hypothesis of immune escape as a mechanism of intrahost evolution that contributes to establishment of persistent infections. However, several recent observations are incompatible with this hypothesis. First, the intrahost HIV population diversifies and diverts continuously from acute state to chronic infection until, at the onset of immunodeficiency, it starts losing heterogeneity and eventually stops diverting (8). Surprisingly, a similar temporal pattern of diversity and diversion was observed for intraho...
Supplementary data are available at Bioinformatics online.
BackgroundNext-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing.ResultsIn this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones.ConclusionsBoth algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses.The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm
Aligning sequencing reads onto a reference is an essential step of the majority of genomic analysis pipelines. Computational algorithms for read alignment have evolved in accordance with technological advances, leading to today’s diverse array of alignment methods. We provide a systematic survey of algorithmic foundations and methodologies across 107 alignment methods, for both short and long reads. We provide a rigorous experimental evaluation of 11 read aligners to demonstrate the effect of these underlying algorithms on speed and efficiency of read alignment. We discuss how general alignment algorithms have been tailored to the specific needs of various domains in biology.
Hepatitis C virus (HCV) causes chronic infection in up to 50% to 80% of infected individuals. Hypervariable region 1 (HVR1) variability is frequently studied to gain an insight into the mechanisms of HCV adaptation during chronic infection, but the changes to and persistence of HCV subpopulations during intrahost evolution are poorly understood. In this study, we used ultradeep pyrosequencing (UDPS) to map the viral heterogeneity of a single patient over 9.6 years of chronic HCV genotype 4a infection. Informed error correction of the raw UDPS data was performed using a temporally matched clonal data set. The resultant data set reported the detection of low-frequency recombinants throughout the study period, implying that recombination is an active mechanism through which HCV can explore novel sequence space. The data indicate that polyvirus infection of hepatocytes has occurred but that the fitness quotients of recombinant daughter virions are too low for the daughter virions to compete against the parental genomes. The subpopulations of parental genomes contributing to the recombination events highlighted a dynamic virome where subpopulations of variants are in competition. In addition, we provide direct evidence that demonstrates the growth of subdominant populations to dominance in the absence of a detectable humoral response. IMPORTANCEAnalysis of ultradeep pyrosequencing data sets derived from virus amplicons frequently relies on software tools that are not optimized for amplicon analysis, assume random incorporation of sequencing errors, and are focused on achieving higher specificity at the expense of sensitivity. Such analysis is further complicated by the presence of hypervariable regions. In this study, we made use of a temporally matched reference sequence data set to inform error correction algorithms. Using this methodology, we were able to (i) detect multiple instances of hepatitis C virus intrasubtype recombination at the E1/E2 junction (a phenomenon rarely reported in the literature) and (ii) interrogate the longitudinal quasispecies complexity of the virome. Parallel to the UDPS, isolation of IgG-bound virions was found to coincide with the collapse of specific viral subpopulations.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.