2023
DOI: 10.1101/2023.04.07.535980
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Evaluating proteomics imputation methods with improved criteria

Abstract: Quantitative measurements produced by tandem mass spectrometry proteomics experiments typically contain a large proportion of missing values. This missingness hinders reproducibility, reduces statistical power, and makes it difficult to compare across samples or experiments. Although many methods exist for imputing missing values in proteomics data, in practice, the most commonly used methods are among the worst performing. Furthermore, previous benchmarking studies have focused on relatively simple measuremen… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 54 publications
0
3
0
Order By: Relevance
“…A recent analysis of imputation strategies for proteomics data identified MissForest, 32 a Random Forest-based approach, as the current best-in-class method for data imputation. 33 We thus implemented a version of MissForest within glycowork. This machine learning-based imputation strategy, in contrast to single-value imputations (e.g., replacing all missing values with 0.1) that are commonly used in glycomics data, does not profoundly affect the underlying distribution of glycan abundances and thus should be more robust to artifacts.…”
Section: Resultsmentioning
confidence: 99%
“…A recent analysis of imputation strategies for proteomics data identified MissForest, 32 a Random Forest-based approach, as the current best-in-class method for data imputation. 33 We thus implemented a version of MissForest within glycowork. This machine learning-based imputation strategy, in contrast to single-value imputations (e.g., replacing all missing values with 0.1) that are commonly used in glycomics data, does not profoundly affect the underlying distribution of glycan abundances and thus should be more robust to artifacts.…”
Section: Resultsmentioning
confidence: 99%
“…Ideally, one should avoid imputation when performing an SCP data analysis (Table ). Imputation of missing values using an unsuitable model can lead to biased estimates and introduces false signals. Which model is suitable for which type of data is still actively benchmarked and debated in the fields of scRNA-Seq and bulk proteomics. , In scRNA-Seq, imputation can have dramatic effects on clustering and leads to strong artifacts in the gene expression landscape depicted by t-SNE or UMAP projections. , Another argument against imputation is that imputation methods may cause oversmoothing, i.e. they remove biological heterogeneity because they directly or indirectly combine observed values.…”
Section: To Impute or Not To Impute?mentioning
confidence: 99%
“…Following this line of thought, several studies focus on determining what works best for different causes of missing values using some form of simulation 8,13 . Other studies focus their analysis on post-translational modifications 15 , the best combination of software tools, datasets and imputation method 16 , normalization and batch effects correction 17 or downstream analysis 18,19 . Other methods have been developed to handle specific missing mechanisms, for instance, random imputation, fixed value imputation such as limit of detection or x-quantile of feature, model-based imputation using k-nearest neighbor (KNN), linear models 13 or tree-based models 14 .…”
Section: Introductionmentioning
confidence: 99%