2019
DOI: 10.3390/ht8010004
|View full text |Cite
|
Sign up to set email alerts
|

A Selective Review of Multi-Level Omics Data Integration Using Variable Selection

Abstract: High-throughput technologies have been used to generate a large amount of omics data. In the past, single-level analysis has been extensively conducted where the omics measurements at different levels, including mRNA, microRNA, CNV and DNA methylation, are analyzed separately. As the molecular complexity of disease etiology exists at all different levels, integrative analysis offers an effective way to borrow strength across multi-level omics data and can be more powerful than single level analysis. In this ar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
124
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
4
2
1

Relationship

4
3

Authors

Journals

citations
Cited by 151 publications
(132 citation statements)
references
References 129 publications
0
124
0
Order By: Relevance
“…Moreover, unlike other integrated modeling which combined features from different platforms either horizontally or hierarchically [7], one unique advantage of HetEnc is that it does not require multi-platform data for the test samples. HetEnc is designed to use multi-platform data to train two heteroencoding networks in the feature representation step, which is totally unsupervised; therefore, no labeling information (i.e., the endpoint) is needed for utilizing the multi-platform data.…”
Section: Discussionmentioning
confidence: 99%
See 2 more Smart Citations
“…Moreover, unlike other integrated modeling which combined features from different platforms either horizontally or hierarchically [7], one unique advantage of HetEnc is that it does not require multi-platform data for the test samples. HetEnc is designed to use multi-platform data to train two heteroencoding networks in the feature representation step, which is totally unsupervised; therefore, no labeling information (i.e., the endpoint) is needed for utilizing the multi-platform data.…”
Section: Discussionmentioning
confidence: 99%
“…In general, gene features were pre-filtered by their p-value (<0.05) and log2 fold-change (>1.5). Parameter K in KNN ranged in (1,3,5,7,9); kernel used in SVM is 'rbf'; and the other parameters are set as default. In training process, each model was trained based on randomly selected 70% of training data and its performance was evaluated on the remaining 30% of training data.…”
Section: Other Machine Learning Algorithmsmentioning
confidence: 99%
See 1 more Smart Citation
“…The lack of common methodologies and terminologies can transform this synergy into a further level of complexity in the process of data integration (51). As observed in (52,53), specific technological limits, noise levels and variability ranges affect the different omics, and thus confounding the underlying biological signals, yielding that really integrative analysis is still very rare, while different methods often discover different kinds of patterns, as evidenced by the lack of consistency in the published results, although efforts in this direction have started appearing (54,55).…”
Section: Background and Related Workmentioning
confidence: 99%
“…In the early-integration approach, also known as juxtaposition-based, the multi-omics datasets are first concatenated into one matrix. To deal with the high-dimensionality of the joint dataset, these methods generally adopt matrix factorization (68,53,55,52), statistical (46,69,70,59,57,44,71,72,73,55), and machine learning tools (74,73,55). Although the dimensionality reduction procedure is necessary and may improve the predictive performance, it can also cause the loss of key information (66).…”
Section: Background and Related Workmentioning
confidence: 99%