2017
DOI: 10.1101/gr.209601.116
|View full text |Cite
|
Sign up to set email alerts
|

UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy

Abstract: Unique Molecular Identifiers (UMIs) are random oligonucleotide barcodes that are increasingly used in high-throughput sequencing experiments. Through a UMI, identical copies arising from distinct molecules can be distinguished from those arising through PCR amplification of the same molecule. However, bioinformatic methods to leverage the information from UMIs have yet to be formalized. In particular, sequencing errors in the UMI sequence are often ignored or else resolved in an ad hoc manner. We show that err… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

5
1,234
1
1

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
2
2

Relationship

1
7

Authors

Journals

citations
Cited by 1,510 publications
(1,321 citation statements)
references
References 32 publications
5
1,234
1
1
Order By: Relevance
“…To precisely quantify and remove PCR duplicated reads, adaptors containing short random sequences are ligated to the fragments during library preparation to uniquely tag each RNA fragment prior to PCR amplification. These unique molecular identifiers (UMIs) enable accurate classification of unique and duplicated reads by comparing the mapped genomic coordinates of reads that contain the same UMI …”
Section: Improving the Recovery Rate Of Rnas Prepared For Sequencingmentioning
confidence: 99%
“…To precisely quantify and remove PCR duplicated reads, adaptors containing short random sequences are ligated to the fragments during library preparation to uniquely tag each RNA fragment prior to PCR amplification. These unique molecular identifiers (UMIs) enable accurate classification of unique and duplicated reads by comparing the mapped genomic coordinates of reads that contain the same UMI …”
Section: Improving the Recovery Rate Of Rnas Prepared For Sequencingmentioning
confidence: 99%
“…The UMIs category has also seen a big increase recently as UMI based protocols have become commonly used and tools designed to handle the extra processing steps required have been developed (UMI-tools 28 , umis 29 , zUMIs 30 ).…”
Section: Figure 3 (A) Categories Of Tools In the Scrna-tools Databasementioning
confidence: 99%
“…In particular, in protocols where most of the fragmentation occurs after amplification, it is not straightforward to identify the group of potential duplicate reads in which to assess the UMIs, as true duplicate reads may contain sequences from different portions of the same gene. For this reason, UMI-tools [7] can consider all reads mapping to the same gene as potential duplicates. However, this approach discards relevant transcript-level information contained in the read mapping and tends to over-collapse UMIs.…”
Section: Methodsological Considerations and Comparisonsmentioning
confidence: 99%
“…Each dataset was processed with alevin with default settings (as in Section S1.5) and the runtime and memory usage was compared with that observed for the CellRanger pipeline [2] (v2.1.1) and a custom pipeline using STAR(v2.4.2a) [28], featureCounts(v1. 4.6) [29] and UMI-tools (v0.5.3) [7], which represents a commonly used alignment-based approach, which we refer to as naïve [30]. The runtime for alevin is roughly an order of magnitude faster than CellRanger and naive ( Figure 3).…”
Section: Comparing Alevin To Existing Approachesmentioning
confidence: 99%
See 1 more Smart Citation