2019
DOI: 10.1093/bioinformatics/btz821
|View full text |Cite
|
Sign up to set email alerts
|

RNASeq_similarity_matrix: visually identify sample mix-ups in RNASeq data using a ‘genomic’ sequence similarity matrix

Abstract: Summary Mistakes in linking a patient’s biological samples with their phenotype data can confound RNA-Seq studies. The current method for avoiding such sample mix-ups is to test for inconsistencies between biological data and known phenotype data such as sex. However, in DNA studies a common QC step is to check for unexpected relatedness between samples. Here, we extend this method to RNA-Seq, which allows the detection of duplicated samples without relying on identifying inconsistencies with… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

1
6
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(7 citation statements)
references
References 7 publications
1
6
0
Order By: Relevance
“…Our study expands significantly upon the previous findings derived from comparisons of 4 EpKS and contralateral uninvolved control skin samples solely from ART treated patients [29]. As reported by Kist et al [93], one control skin sample in that analysis was even sequenced twice. Here, we show a much more comprehensive analysis of a substantially more encompassing set of KS patients and controls that not only provides the new insights summarized above but has confirmed the overall conclusions of our previous study.…”
Section: Plos Pathogenssupporting
confidence: 80%
“…Our study expands significantly upon the previous findings derived from comparisons of 4 EpKS and contralateral uninvolved control skin samples solely from ART treated patients [29]. As reported by Kist et al [93], one control skin sample in that analysis was even sequenced twice. Here, we show a much more comprehensive analysis of a substantially more encompassing set of KS patients and controls that not only provides the new insights summarized above but has confirmed the overall conclusions of our previous study.…”
Section: Plos Pathogenssupporting
confidence: 80%
“…Others reported that roughly 2% 6 or 3% 1 of all samples analyzed, including samples from a global network of biobanks 6 or publicly available genomic data from multiple studies, 1 were paired with the wrong metadata. The high prevalence of these errors, and the recognition that even a small number of errors can negatively impact study integrity, has led to the adoption of quality control protocols designed to detect problematic samples in genomic and transcriptomic studies 1,6–9 . However, the causes of sample misannotation are not limited to genomic analysis.…”
Section: Introductionmentioning
confidence: 99%
“…The high prevalence of these errors, and the recognition that even a small number of errors can negatively impact study integrity, has led to the adoption of quality control protocols designed to detect problematic samples in genomic and transcriptomic studies. 1,[6][7][8][9] However, the causes of sample misannotation are not limited to genomic analysis. These types of issues can also occur in cytometry datasets, but there are no established methods for the molecular identification of a sample from information captured within cytometry data.…”
mentioning
confidence: 99%
“…Others reported that roughly 2% (6) or 3% (1) of all samples analyzed, including samples from a global network of biobanks (6) or publicly available genomic data from multiple studies (1), were paired with the wrong metadata. The high prevalence of these errors, and the recognition that even a small number of errors can negatively impact study integrity, has led to the adoption of quality control protocols designed to detect problematic samples in genomic and transcriptomic studies (1,(6)(7)(8)(9). However, the causes of sample misannotation are not limited to genomic analysis.…”
Section: Introductionmentioning
confidence: 99%
“…While this approach has several limitations including the inability to identify sample mix-ups that occur between participants of the same sex and potential false positives in participants with sex chromosome anomalies, it has been used to successfully identify annotation errors in publicly available datasets (2). Another widely implemented method utilizes single nucleotide polymorphisms (SNPs) to identify samples collected from the same individual, allowing investigators to detect a larger proportion of errors in longitudinal datasets (1,8,9).…”
Section: Introductionmentioning
confidence: 99%