2018
DOI: 10.1016/j.plantsci.2017.10.014
|View full text |Cite
|
Sign up to set email alerts
|

Raising orphans from a metadata morass: A researcher's guide to re-use of public ’omics data

Abstract: More than 15 petabases of raw RNAseq data is now accessible through public repositories. Acquisition of other 'omics data types is expanding, though most lack a centralized archival repository. Data-reuse provides tremendous opportunity to extract new knowledge from existing experiments, and offers a unique opportunity for robust, multi-'omics analyses by merging metadata (information about experimental design, biological samples, protocols) and data from multiple experiments. We illustrate how predictive rese… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
25
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 20 publications
(25 citation statements)
references
References 85 publications
(107 reference statements)
0
25
0
Order By: Relevance
“…Using RNA-Seq data with high representation of orphan gene expression dramatically improves the efficacy of gene prediction by MAKER, by the Direct Inference pipeline, and by the BIND and MIND combination pipelines. Thus, the accuracy of an initial genome annotation will be maximized by selecting RNA-Seq data from diverse samples, including samples that typically express high levels of young genes, e.g., reproductive tissues and stressed tissues 10,14,54,[72][73][74][75][76][77][78]83 . Additionally, reannotation 70,84,85 outcomes can be optimized by utilizing evidence from diverse samples in the expanded body of public RNA-Seq datasets.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Using RNA-Seq data with high representation of orphan gene expression dramatically improves the efficacy of gene prediction by MAKER, by the Direct Inference pipeline, and by the BIND and MIND combination pipelines. Thus, the accuracy of an initial genome annotation will be maximized by selecting RNA-Seq data from diverse samples, including samples that typically express high levels of young genes, e.g., reproductive tissues and stressed tissues 10,14,54,[72][73][74][75][76][77][78]83 . Additionally, reannotation 70,84,85 outcomes can be optimized by utilizing evidence from diverse samples in the expanded body of public RNA-Seq datasets.…”
Section: Discussionmentioning
confidence: 99%
“…The "orphan-rich" dataset is designed to maximize orphan gene representation. We reasoned that selecting samples which contain a breadth of orphan gene transcripts could be important because many, though by no means all 10 , orphans are highly expressed under only a very limited set of conditions 10,14,54,[72][73][74][75][76][77][78] . The "orphan-rich" RNA-Seq samples were chosen from over a thousand samples in SRA; over 60% of all Araport11-annotated orphans are transcribed in each sample.…”
Section: Rna-seq and Protein Used As Extrinsic Evidence For Gene Predmentioning
confidence: 99%
See 1 more Smart Citation
“…For example, a researcher might wish to add fields for references, dates, notes, or keywords. A caveat is that leveraging archived metadata is only as good as the metadata provided; metadata about the samples may be incomplete or misleading, and the quality varies from study to study (7,36). Despite this, metadata is key to understanding the significance of the experiments (7,36).…”
Section: B2 Sample Metadata Files For a New Mog Projectmentioning
confidence: 99%
“…Integrative analysis of data from the multiple studies representing diverse biological conditions is key to fully exploit these vast data resources for scientific discovery (5,6). Such analysis allows efficient reuse and recycling of these available data and its metadata (1,5,7,8). Higher statistical power * To whom correspondence should be addressed.…”
Section: Introductionmentioning
confidence: 99%