2023
DOI: 10.1093/bioinformatics/btad021
|View full text |Cite
|
Sign up to set email alerts
|

Dealing with dimensionality: the application of machine learning to multi-omics data

Abstract: Motivation Machine learning (ML) methods are motivated by the need to automate information extraction from large data sets in order to support human users in data-driven tasks. This is an attractive approach for integrative joint analysis of vast amounts of omics data produced in next generation sequencing and other -omics assays. A systematic assessment of the current literature can help to identify key trends and potential gaps in methodology and applications. We surveyed the literature on … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
31
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
4

Relationship

1
8

Authors

Journals

citations
Cited by 36 publications
(31 citation statements)
references
References 44 publications
0
31
0
Order By: Relevance
“…More so, using the entire feature space without considering the relevance of the individual features hinders achieving an optimal model performance. Given that genomic datasets suffer from the curse of dimensionality 6,8,[29][30][31][32] , it is crucial to eliminate irrelevant features and retain only the most informative variants related to the phenotype under investigation. Removing noise from the data improves models' accuracy and reliability, thereby gaining a deeper understanding of the genetic mechanisms underlying risk susceptibility.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…More so, using the entire feature space without considering the relevance of the individual features hinders achieving an optimal model performance. Given that genomic datasets suffer from the curse of dimensionality 6,8,[29][30][31][32] , it is crucial to eliminate irrelevant features and retain only the most informative variants related to the phenotype under investigation. Removing noise from the data improves models' accuracy and reliability, thereby gaining a deeper understanding of the genetic mechanisms underlying risk susceptibility.…”
Section: Discussionmentioning
confidence: 99%
“…Machine learning (ML) is a widely accepted methodical framework in analyzing high-dimensional and complex data 6,8,[29][30][31][32] , owing to its unparalleled ability to handle high-volume data and uncover implicit and nonlinear patterns that are pertinent for predictive modeling. By selecting a minimum subset of individually relevant and neighboring features while minimizing information loss 33 , ML captures complex interactions, leading to the identification of highly-predictive features.…”
Section: Introductionmentioning
confidence: 99%
“…A large number of features is a common problem in predictive machine learning models 38 . Thus, similar to a previous study of microbiome-based prediction of all-cause mortality 21 , we found the approach to compress the data into a smaller number of informative characteristics by aggregating species into co-abundance networks provided a strategy for feature engineering in microbiome-based risk prediction.…”
Section: Discussionmentioning
confidence: 99%
“…These articles primarily aim to summarize the latest trends and developments in data modalities, feature engineering methods, and AI models specifically related to survival prediction. However, the focus of these reviews is often constrained to either a singular disease or multiple subtypes of cancer, highlighting a limited scope within the broader landscape of survival prediction research 37,[44][45][46][47][48] . More comprehensive details about the scope of existing review articles in terms of contributions and drawbacks are summarised in Table 1 and section .…”
Section: Introductionmentioning
confidence: 99%