2019
DOI: 10.1186/s12859-019-2641-8
|View full text |Cite
|
Sign up to set email alerts
|

Shambhala: a platform-agnostic data harmonizer for gene expression data

Abstract: BackgroundHarmonization techniques make different gene expression profiles and their sets compatible and ready for comparisons. Here we present a new bioinformatic tool termed Shambhala for harmonization of multiple human gene expression datasets obtained using different experimental methods and platforms of microarray hybridization and RNA sequencing.ResultsUnlike previously published methods enabling good quality data harmonization for only two datasets, Shambhala allows conversion of multiple datasets into … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
52
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2
1

Relationship

3
5

Authors

Journals

citations
Cited by 30 publications
(52 citation statements)
references
References 37 publications
0
52
0
Order By: Relevance
“…Another possibility is to aggregate different smaller datasets into bigger ones. For such aggregation, a new harmonizing technique, which is capable to merge arbitrary number of datasets obtained using arbitrary experimental platforms [46], may be applied.…”
Section: Discussionmentioning
confidence: 99%
“…Another possibility is to aggregate different smaller datasets into bigger ones. For such aggregation, a new harmonizing technique, which is capable to merge arbitrary number of datasets obtained using arbitrary experimental platforms [46], may be applied.…”
Section: Discussionmentioning
confidence: 99%
“…We selected two benchmark data sets previously used in similar studies (Rudy and Valafar, 2011;Borisov et al, 2019). The first set (here called the reference data set) originates from projects MAQC (MAQC-I) (MAQC Consortium, 2006) and SEQC/MAQC-III (SEQC/MAQC-III Consortium, 2014), which made use of reference RNA samples to assess repeatability of gene-expression microarray data within a specific site, reproducibility across multiple sites and comparability across multiple platforms.…”
Section: The Data Setsmentioning
confidence: 99%
“…These biosamples had been analyzed using different platforms and in different sites, as described (MAQC Consortium, 2006;SEQC/MAQC-III Consortium, 2014). Following the work from Rudy and Valafar (2011) and Borisov et al (2019), we selected data from six of the platforms (between parentheses, data-set identifier in this study, GEO platform ID and project of origin): Note that in the MAQC-I study the following microarrays from AG1 were discarded as outliers after the Agilent's Feature Extraction QC Report: AG1_1_A1, AG1_2_A3, AG1_2_D2, AG1_3_B3. Since the data for these microarrays is nevertheless deposited and we wanted our analysis to be as independent as possible of platform-dependent data-preprocessing steps, we considered also their inclusion.…”
Section: The Reference Data Setmentioning
confidence: 99%
See 1 more Smart Citation
“…However, RNA sequencing datasets obtained using different equipment, reagents and protocols may be poorly compatible with each other (Buzdin et al 2014;Borisov et al 2019) and ideally the same experimental platform should be used to compare the results (Borisov et al 2019). We recently published an annotated database of RNA sequencing profiles termed Oncobox Atlas of Normal Tissue Expression (ANTE) (Suntsova et al 2019) that represents 142 solid tissue samples from human healthy donors killed in road accidents, and seventeen blood samples from healthy volunteers.…”
Section: Introductionmentioning
confidence: 99%