2017
DOI: 10.1155/2017/2437608
|View full text |Cite
|
Sign up to set email alerts
|

Metabolomic Biomarker Identification in Presence of Outliers and Missing Values

Abstract: Metabolomics is the sophisticated and high-throughput technology based on the entire set of metabolites which is known as the connector between genotypes and phenotypes. For any phenotypic changes, potential metabolite (biomarker) identification is very important because it provides diagnostic as well as prognostic markers and can help to develop new biomolecular therapy. Biomarker identification from metabolomics data analysis is hampered by the use of high-throughput technology that provides high dimensional… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2018
2018
2023
2023

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 24 publications
(14 citation statements)
references
References 25 publications
0
14
0
Order By: Relevance
“…On the other hand, when missing values are seemingly randomly distributed among study groups, the situation is less critical and the analyst has more freedom to remove highly missing variables, while imputing the rest. Some of the most popular strategies for handling missing data in metabolomics are zero imputation, k‐nearest neighbors (kNN), and random forest (RF) imputation . Several other strategies have been proposed and implemented, although none of them is considered universally optimal .…”
Section: Data Visualization Preprocessing and Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…On the other hand, when missing values are seemingly randomly distributed among study groups, the situation is less critical and the analyst has more freedom to remove highly missing variables, while imputing the rest. Some of the most popular strategies for handling missing data in metabolomics are zero imputation, k‐nearest neighbors (kNN), and random forest (RF) imputation . Several other strategies have been proposed and implemented, although none of them is considered universally optimal .…”
Section: Data Visualization Preprocessing and Analysismentioning
confidence: 99%
“…Some of the most popular strategies for handling missing data in metabolomics are zero imputation, k‐nearest neighbors (kNN), and random forest (RF) imputation . Several other strategies have been proposed and implemented, although none of them is considered universally optimal . In MS, it has been shown that single‐value imputation methods (such as zero, half‐minimum, mean, or median imputation) risk to artificially reduce and skew variable distributions and therefore should not be the first choice.…”
Section: Data Visualization Preprocessing and Analysismentioning
confidence: 99%
“…Since, these two data matrices didn't contain any missing values, therefore, to investigate the efficiency of the proposed technique compared to the other techniques; we randomly incorporated different rates (5%, 10%, 15% and 20%) of missing values and also computed the mean square error (MSE) between the reconstructed data and original data. We also considered two datasets-Hepatocellular Carcinoma (HCC) with 26.52% missing values/cells 27 and MDA-MB-231 breast cancer dataset with 15.81% missing values 28 for evaluating the performance of the proposed missing value imputation method. HCC and MDA-MB-231 dataset are also modified by artificially included various rates (3%, 5%, 7% and 10%) of outliers to investigate the performance of the proposed method.…”
Section: Real Metabolomics Datamentioning
confidence: 99%
“…Too many missing values will cause difficulties for downstream analysis. There are several different methods for this purpose, such as replace by a small values, mean/median, k-nearest neighbor (KNN), probabilistic principal components analysis (PPCA), Bayesian PCA (BPCA) method, and singular value decomposition (SVD) method to impute the missing values (Kumar et al, 2017;Do et al, 2018). In our work, the default method replaces all the missing values with small values (the half of the minimum positive values in the original data) assuming to be the detection limit, and the data were not transformed.…”
Section: Multivariate Statistical Analysesmentioning
confidence: 99%