2023
DOI: 10.1101/2023.01.12.523792
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning

Abstract: Imputation techniques provide means to replace missing measurements with a value and are used in almost all downstream analysis of mass spectrometry (MS) based proteomics data using label-free quantification (LFQ). Some methods only impute assuming the limit of detection (LOD) was not passed and therefore impute missing values with too low or too high intensities, potentially leading to biased results in downstream statistical analysis. Here we test how self supervised deep learning models can impute missing v… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(13 citation statements)
references
References 86 publications
0
13
0
Order By: Relevance
“…Deep neural networks have proven highly generalizable in other contexts. Recent "deep" impute methods may be a step in the right direction [20], though much work remains to be done. In the future, data-driven imputation methods will likely be broadly adopted as part of general signal processing workflows in proteomics.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Deep neural networks have proven highly generalizable in other contexts. Recent "deep" impute methods may be a step in the right direction [20], though much work remains to be done. In the future, data-driven imputation methods will likely be broadly adopted as part of general signal processing workflows in proteomics.…”
Section: Discussionmentioning
confidence: 99%
“…On the basis of the resulting citation counts (Figure 1), we selected four of the most popular imputation methods: k-nearest neighbor (kNN) [3], MissForest [11], Gaussian sampling [9], and low value replacement (Figure 1). We also include a non-negative matrix factorization (NMF) imputation method, which has recently been proposed for proteomics [18][19][20]. By focusing only on the most commonly used imputation methods, our aim is to provide a practical comparison that will be beneficial to experimental proteomicists.…”
Section: Introductionmentioning
confidence: 99%
“…We hypothesized that this data could be used to explore self-supervised deep learning models for imputing MS-based proteomics data on the basis of ionised peptides generated from denatured proteins. Therefore in a companion paper we adapted three deep learning architectures for imputation of proteomics data, de ned a work ow for model comparison and analysed the effect of imputation using different methods on a downstream analysis task 3 .…”
Section: Background and Summarymentioning
confidence: 99%
“…From the MaxQuant search of each le, we dumped the tab separated les to PRIDE along the associated raw le. In a companion paper we used the "evidence.txt" for precursor quanti cations, "peptides.txt" for aggregated peptides and "proteinGroups.txt" for protein groups referencing them by their gene group 3 .…”
Section: Processing Steps Of a Single Raw Lementioning
confidence: 99%
“…On the basis of the resulting citation counts (Figure ), we selected four of the most popular imputation methods: k -nearest neighbor (kNN), MissForest, Gaussian sampling, and low value replacement. We also include a non-negative matrix factorization (NMF) imputation method, which has recently been proposed for proteomics. By focusing on only the most commonly used imputation methods, our aim is to provide a practical comparison that will be beneficial to experimental proteomicists. For this reason, seldom used R packages (e.g., imp4p, impSeqRob, and QRLIC) have been omitted from our analysis.…”
Section: Introductionmentioning
confidence: 99%