2017
DOI: 10.1186/s12859-017-1645-5
|View full text |Cite
|
Sign up to set email alerts
|

Automated multigroup outlier identification in molecular high-throughput data using bagplots and gemplots

Abstract: BackgroundAnalyses of molecular high-throughput data often lack in robustness, i.e. results are very sensitive to the addition or removal of a single observation. Therefore, the identification of extreme observations is an important step of quality control before doing further data analysis. Standard outlier detection methods for univariate data are however not applicable, since the considered data are high-dimensional, i.e. multiple hundreds or thousands of features are observed in small samples. Usually, out… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
30
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
4

Relationship

0
10

Authors

Journals

citations
Cited by 38 publications
(32 citation statements)
references
References 48 publications
0
30
0
Order By: Relevance
“…Zani et al (1998) applied convex hull peeling, which is somewhat less robust than half-space depth, as shown by Donoho and Gasko (1992). The notion of bagplots is based in Tukey's halfspace depth which has no restriction for any dimensions (Kruppa & Jung, 2017;Rousseeuw et al, 1999). The factor with which the bagplot is plotted poses another advantage of our proposal because the occurrence probability of false alarms is theoretically reduced to 0:002 (Rousseeuw et al, 1999) leading in reduction of loss due to shrinkage.…”
Section: Discussionmentioning
confidence: 99%
“…Zani et al (1998) applied convex hull peeling, which is somewhat less robust than half-space depth, as shown by Donoho and Gasko (1992). The notion of bagplots is based in Tukey's halfspace depth which has no restriction for any dimensions (Kruppa & Jung, 2017;Rousseeuw et al, 1999). The factor with which the bagplot is plotted poses another advantage of our proposal because the occurrence probability of false alarms is theoretically reduced to 0:002 (Rousseeuw et al, 1999) leading in reduction of loss due to shrinkage.…”
Section: Discussionmentioning
confidence: 99%
“…High throughput sequencing of gene expression data (HTSeq-counts) and clinical information of 323 control cases and 471 cases of melanoma cases were downloaded from TCGA o cial website. Expression of HSD11B2 differences for discrete variables were visualized using boxplots and whiskers plot [17].…”
Section: Rna-sequencing Patient Data and Bioinformatics Analysismentioning
confidence: 99%
“…We selected patients with pathological type of adenomas or adenocarcinomas. The expression differences for discrete variables were visualized using Boxplots [8].…”
Section: Downloading Mrna Data and Analysismentioning
confidence: 99%