Bartender: a fast and accurate clustering algorithm to count barcode reads

Zhao, Lu; Liu, Zhimin; Levy, Sasha F.; Wu, Song

doi:10.1093/bioinformatics/btx655

Cited by 84 publications

(78 citation statements)

References 47 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Raw sequence files, separated by their fraction-associated barcodes, were processed with Cutadapt 47 , outputting the 50 nt UTR and 9 -15 nt of the N-terminal of the CDS. UTRs were clustered and UMIs were counted using Bartender 48 . The eGFP library contained approximately 750,000 unique sequences and the mCherry library contained approximately 500,000 sequences.…”

Section: Sequence Processingmentioning

confidence: 99%

Human 5′ UTR design and variant effect prediction from a massively parallel translation assay

et al. 2019

View full text Add to dashboard Cite

The ability to predict the impact of cis-regulatory sequence on gene expression would facilitate discovery in fundamental and applied biology. Here, we combine polysome profiling of a library of 280,000 randomized 5′ UTRs with deep learning to build a predictive model that relates human 5′ UTR sequence to translation. Together with a genetic algorithm, we use the model to engineer new 5′ UTRs that accurately direct specified levels of ribosome loading, providing the ability to tune sequences for optimal protein expression. We show that the same approach can be extended to chemically modified RNA, an important feature for applications in mRNA therapeutics and synthetic biology. We test 35,000 truncated human 5′ UTRs and 3,577 naturally occurring variants and show that the model predicts ribosome loading of these sequences. Finally, we provide evidence of 45 SNVs associated with human diseases that substantially change ribosome loading and thus may represent a molecular basis for disease. The sequence of the 5′ untranslated region (5′ UTR) is a primary determinant of translation efficiency 1,2. While many cis-regulatory elements within human 5′ UTRs have been characterized individually, the field still lacks a means to accurately predict protein expression from 5′ UTR sequence alone, limiting the ability to estimate the effects of genome-encoded variants and the ability to engineer 5′ UTRs for precise translation control. Massively parallel reporter assays (MPRAs)-methods that assess thousands to millions of sequence variants in a single experiment-coupled with machine learning have proven † Corresponding author. gseelig@uw.edu. Author contributions P.J.S and B.W. designed and performed experiments, performed data analysis and modeling, and wrote the manuscript. D.R. performed fluorescence validation experiments. V.P. and I.M. wrote the manuscript. D.R.M. helped design polysome profiling. G.S. designed experiments and wrote the manuscript.

show abstract

Section: Sequence Processingmentioning

confidence: 99%

Human 5′ UTR design and variant effect prediction from a massively parallel translation assay

et al. 2019

View full text Add to dashboard Cite

show abstract

“…Lineage tracking from barcode sequencing was reconstructed as described in 41 and using https://github.com/Sherlock-Lab/Barcode_seq/blob/master/bartender_BC1_BC2.py with some minor modifications. Briefly, after extraction of the UMI, and both low and high complexity barcodes from the sequencing read, low complexity barcodes were clustered against their expected sequences, whereas the high complexity barcodes were pooled across all libraries and clustered with bartender (v1.1) 45 . The updated reads and the UMIs were used to derive raw barcode counts, which were assembled into the raw count lineage trajectories.…”

Section: Dfe / Mutational Fitness Spectrum U(s) Inferencementioning

confidence: 99%

Changes in the distribution of fitness effects and adaptive mutational spectra following a single first step towards adaptation

Aggeli

Sherlock

2020

Preprint

View full text Add to dashboard Cite

The fitness effects of random mutations are contingent upon the genetic and environmental contexts in which they occur, and this contributes to the unpredictability of evolutionary outcomes at the molecular level. Despite this unpredictability, the rate of adaptation in homogeneous environments tends to decrease over evolutionary time, due to diminishing returns epistasis, causing relative fitness gains to be predictable over the long term. Here, we studied the extent of diminishing returns epistasis and the changes in the adaptive mutational spectra after yeast populations have already taken their first adaptive mutational step. We used three distinct adaptive clones that arose under identical conditions from a common ancestor, from which they diverge by a single point mutation, to found populations that we further evolved. We followed the evolutionary dynamics of these populations by lineage tracking and determined adaptive outcomes using fitness assays and whole genome sequencing. We found compelling evidence for diminishing returns: fitness gains during the 2 nd step of adaptation are smaller than those of the 1 st step, due to a compressed distribution of fitness effects in the 2 nd step. We also found strong evidence for historical contingency at the genic level: the beneficial mutational spectra of the 2 nd -step adapted genotypes differ with respect to their ancestor and to each other, despite the fact that the three founders' 1 st -step mutations provided their fitness gains due to similar phenotypic improvements. While some targets of selection in the second step are shared with those seen in the common ancestor, other targets appear to be contingent on the specific first step mutation, with more phenotypically similar founding clones having more similar adaptive mutational spectra. Finally, we found that disruptive mutations, such as nonsense and frameshift, were much more common in the first step of adaptation, contributing an additional way that both diminishing returns and historical contingency are evident during 2 nd step adaptation. Stephen Jay Gould argued that historical contingency makes evolutionary outcomes largely unpredictable, and that were we to replay the "tape of life", we would likely end up with a different world each time 1 . However, frequently observed instances of both parallel 2-4 and convergent 5-7 evolution suggest that, at least under some circumstances, adapting populations may simply take different paths to the same peak on a fitness landscape. Environmental similarities, genotypic relatedness and proximity to an optimum in the fitness landscape constitute some of the constraints contributing to convergent or parallel adaptive responses 4,8-21 .Closely related genotypes are often employed to study the effects of evolutionary history on adaptation in various experimental systems [22][23][24][25][26][27][28][29][30] . A frequent observation is that fitness gains decrease over time during adaptive evolution-termed diminishing returns-most convincingly demonstrated in cases whe...

show abstract

“…To quantify individual lineages, we isolated the subpopulation containing CNVs from two populations (bc01 and bc02) at multiple timepoints (generations 70, 90, 150, and 270) using fluorescence activated cell sorting (FACS) ( Figure 5A ). We sequenced barcodes from the CNV subpopulation at each time point and determined the number of unique lineages ( [69] and methods ). To account for variation in the purity of the isolated CNV subpopulation, we analyzed individual clones from the CNV subpopulation isolated by FACS to estimate a false positive rate, which we find varies as a function of time point ( Figure S12B and methods ), and applied this correction to barcode counts ( Table S10 ).…”

Section: Glucose-limitation Urea-limitation Glutamine-limitationmentioning

confidence: 99%

“…However, the reverse read failed due to over-clustering, so all analyses were performed only using the forward read. We used the Bartender algorithm with UMI handling to account for PCR duplicates and to cluster sequences with merging decisions based solely on distance except in cases of low coverage (<500 reads/barcode), for which the default cluster merging threshold was used [69] . Clusters with a size less than four or with high entropy (>0.75 quality score) were discarded.…”

Section: Quantifying the Number Of Cnv Lineagesmentioning

confidence: 99%

Single-cell copy number variant detection reveals the dynamics and diversity of adaptation

Lauer

Avecilla

Spealman

et al. 2018

Preprint

View full text Add to dashboard Cite

Copy number variants (CNVs) are a pervasive, but understudied source of genetic variation and evolutionary potential. Long-term evolution experiments in chemostats provide an ideal system for studying the molecular processes underlying CNV formation and the temporal dynamics of de novo CNVs. Here, we developed a fluorescent reporter to monitor gene amplifications and deletions at a specific locus with single-cell resolution. Using a CNV reporter in nitrogen-limited chemostats, we find that GAP1 CNVs are repeatedly generated and selected during the early stages of adaptive evolution resulting in predictable dynamics of CNV selection. However, subsequent diversification of populations defines a second phase of evolutionary dynamics that cannot be predicted. Using whole genome sequencing, we identified a variety of GAP1 CNVs that vary in size and copy number. Despite GAP1 's proximity to tandem repeats that facilitate intrachromosomal recombination, we find that non-allelic homologous recombination (NAHR) between flanking tandem repeats occurs infrequently. Rather, breakpoint characterization revealed that for at least 50% of GAP1 CNVs, origin-dependent inverted-repeat amplification (ODIRA), a DNA replication mediated process, is the likely mechanism. We also find evidence that ODIRA generates DUR3 CNVs, indicating that it may be a common mechanism of gene amplification. We combined the CNV reporter with barcode lineage tracking and found that 10 3 -10 4 independent CNV-containing lineages initially compete within populations, which results in extreme clonal interference. Our study introduces a novel means of studying CNVs in heterogeneous cell populations and provides insight into the underlying dynamics of CNVs in evolution.condition [32,33] . A high rate of CNV formation suggests that multiple, independent CNV-containing lineages may compete during adaptive evolution resulting in clonal interference, which is characteristic of large, evolving populations [29,[34][35][36] . However, the extent of clonal interference among CNV-containing lineages is unknown.The general amino acid permease, GAP1 , is ideally suited to studying the role of CNVs in adaptive evolution. GAP1 encodes a high-affinity transporter for all naturally occurring amino acids and analogues, and it is highly expressed in nitrogen-poor conditions [37,38] . We have previously shown that two classes of CNVs are selected at the GAP1 locus in S. cerevisiae : amplification alleles in glutamine and glutamate-limited chemostats and deletion alleles in ureaand allantoin-limited chemostats [24,25] . GAP1 CNVs are also found in natural populations.Multiple, tandem copies of GAP1 have been identified in wild populations of the nectar yeast, Metschnikowia reukaufii , which result in a competitive advantage over other microbes when amino acids are scarce [39] . As a frequent target of selection in adverse environments in both experimental and natural populations, GAP1 is a model locus for studying the dynamics and mechanisms underlying both gene amplification a...

show abstract

Bartender: a fast and accurate clustering algorithm to count barcode reads

Abstract: Supplementary data are available at Bioinformatics online.

Cited by 84 publications

References 47 publications

Human 5′ UTR design and variant effect prediction from a massively parallel translation assay

Human 5′ UTR design and variant effect prediction from a massively parallel translation assay

Changes in the distribution of fitness effects and adaptive mutational spectra following a single first step towards adaptation

Single-cell copy number variant detection reveals the dynamics and diversity of adaptation

Contact Info

Product

Resources

About