Mass spectrometry-based proteomic technique has become indispensable in current exploration of complex and dynamic biological processes. Instrument development has largely ensured the effective production of proteomic data, which necessitates commensurate advances in statistical framework to discover the optimal proteomic signature. Current framework mainly emphasizes the generalizability of the identified signature in predicting the independent data but neglects the reproducibility among signatures identified from independently repeated trials on different sub-dataset. These problems seriously restricted the wide application of the proteomic technique in molecular biology and other related directions. Thus, it is crucial to enable the generalizable and reproducible discovery of the proteomic signature with the subsequent indication of phenotype association. However, no such tool has been developed and available yet. Herein, an online tool, POSREG, was therefore constructed to identify the optimal signature for a set of proteomic data. It works by (i) identifying the proteomic signature of good reproducibility and aggregating them to ensemble feature ranking by ensemble learning, (ii) assessing the generalizability of ensemble feature ranking to acquire the optimal signature and (iii) indicating the phenotype association of discovered signature. POSREG is unique in its capacity of discovering the proteomic signature by simultaneously optimizing its reproducibility and generalizability. It is now accessible free of charge without any registration or login requirement at https://idrblab.org/posreg/
The label-free quantification (LFQ) has emerged as an exceptional technique in proteomics owing to its broad proteome coverage, great dynamic ranges and enhanced analytical reproducibility. Due to the extreme difficulty lying in an in-depth quantification, the LFQ chains incorporating a variety of transformation, pretreatment and imputation methods are required and constructed. However, it remains challenging to determine the well-performing chain, owing to its strong dependence on the studied data and the diverse possibility of integrated chains. In this study, an R package EVALFQ was therefore constructed to enable a performance evaluation on >3000 LFQ chains. This package is unique in (a) automatically evaluating the performance using multiple criteria, (b) exploring the quantification accuracy based on spiking proteins and (c) discovering the well-performing chains by comprehensive assessment. All in all, because of its superiority in assessing from multiple perspectives and scanning among over 3000 chains, this package is expected to attract broad interests from the fields of proteomic quantification. The package is available at https://github.com/idrblab/EVALFQ.
ANPELA is widely used for quantifying traditional bulk proteomic data. Recently, there is a clear shift from bulk proteomics to the single-cell ones (SCP), for which powerful cytometry techniques demonstrate the fantastic capacity of capturing cellular heterogeneity that is completely overlooked by traditional bulk profiling. However, the in-depth and high-quality quantification of SCP data is still challenging and severely affected by the large numbers of quantification workflows and extreme performance dependence on the studied datasets. In other words, the proper selection of well-performing workflow(s) for any studied dataset is elusory, and it is urgently needed to have a significantly enhanced and accelerated tool to address this issue. However, no such tool is developed yet. Herein, ANPELA is therefore updated to its 2.0 version (https://idrblab.org/anpela/), which is unique in providing the most comprehensive set of quantification alternatives (>1000 workflows) among all existing tools, enabling systematic performance evaluation from multiple perspectives based on machine learning, and identifying the optimal workflow(s) using overall performance ranking together with the parallel computation. Extensive validation on different benchmark datasets and representative application scenarios suggest the great application potential of ANPELA in current SCP research for gaining more accurate and reliable biological insights.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.