2021
DOI: 10.1016/j.patter.2021.100245
|View full text |Cite
|
Sign up to set email alerts
|

A community effort to identify and correct mislabeled samples in proteogenomic studies

Abstract: Summary Sample mislabeling or misannotation has been a long-standing problem in scientific research, particularly prevalent in large-scale, multi-omic studies due to the complexity of multi-omic workflows. There exists an urgent need for implementing quality controls to automatically screen for and correct sample mislabels or misannotations in multi-omic studies. Here, we describe a crowdsourced precisionFDA NCI-CPTAC Multi-omics Enabled Sample Mislabeling Correction Challenge, which provides a fram… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 33 publications
0
3
0
Order By: Relevance
“…Thus, Figure 7c,d shows that the performance of PLS-DA on textiles was poor, and the robustness of XGBoost was better. Last, since manual labeling was used in this work, as shown in Figure 7b, mislabeling was inevitable 33 owing to limited experience and knowledge. In the last line of image Test3, there were some strings interspersed in the middle of the bamboo board that were inaccurately labeled as a type of wood.…”
Section: Testing Resultmentioning
confidence: 99%
“…Thus, Figure 7c,d shows that the performance of PLS-DA on textiles was poor, and the robustness of XGBoost was better. Last, since manual labeling was used in this work, as shown in Figure 7b, mislabeling was inevitable 33 owing to limited experience and knowledge. In the last line of image Test3, there were some strings interspersed in the middle of the bamboo board that were inaccurately labeled as a type of wood.…”
Section: Testing Resultmentioning
confidence: 99%
“…A recent study by Yoo et al. ( 26 ), for instance, reported a community effort to address sample mislabelling issues in proteogenomic and multi-omics studies, and found 7.5% and 3.5% mislabelled samples in two datasets. To our best knowledge, tissue heterogeneity has not been addressed on a large scale by such a community effort.…”
Section: Discussionmentioning
confidence: 99%
“…Therefore, it is essential to integrate real-life data from communities with complementary technical strengths and complex performances. A paradigm model is the crowdsourced precisionFDA challenges, which leverages the power of community participants to identify the QC tools with high accuracy and robustness 17 , and to upgrade benchmarks for easy- and difficult-to-map genomics regions 18 , etc. This exemplary model deserves to be extended to more dimensions with other types of omic studies to help researchers gain the knowledge and resources to ensure data quality and thus improve the reliability of omics-based biological discoveries.…”
mentioning
confidence: 99%