2021
DOI: 10.1093/bioinformatics/btab536
|View full text |Cite
|
Sign up to set email alerts
|

Organism-specific training improves performance of linear B-cell epitope prediction

Abstract: Motivation In silico identification of linear B-cell epitopes represents an important step in the development of diagnostic tests and vaccine candidates, by providing potential high-probability targets for experimental investigation. Current predictive tools were developed under a generalist approach, training models with heterogeneous data sets to develop predictors that can be deployed for a wide variety of pathogens. However, continuous advances in processing power and the increasing amoun… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

6
35
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2
1

Relationship

2
4

Authors

Journals

citations
Cited by 11 publications
(41 citation statements)
references
References 46 publications
6
35
0
Order By: Relevance
“…The results also show that organism-specific training outperforms generalist training (predictors from the literature, trained on peptides from a wide variety of pathogens) even when very small organism-specific data sets are available. The only systematic exception was the high observed performance of LBtope for the Hepatitis C Virus; However, as mentioned earlier, "part of the hold-out examples used to asses the performance of the models is present in the training data of LBtope (9.59% of the Hep C hold-out sequences are present in the LBtope training data set)" [45], which in the case of our experiments would result in some level of data leakage [49]. In addition to showing that organism-specific training outperforms heterogeneous and hybrid training, this work shows that adding unrelated data to organism-specific training sets decreases the generalisation performance of the resulting model when tasked with predicting epitopes for the target pathogen.…”
Section: Discussionmentioning
confidence: 93%
See 4 more Smart Citations
“…The results also show that organism-specific training outperforms generalist training (predictors from the literature, trained on peptides from a wide variety of pathogens) even when very small organism-specific data sets are available. The only systematic exception was the high observed performance of LBtope for the Hepatitis C Virus; However, as mentioned earlier, "part of the hold-out examples used to asses the performance of the models is present in the training data of LBtope (9.59% of the Hep C hold-out sequences are present in the LBtope training data set)" [45], which in the case of our experiments would result in some level of data leakage [49]. In addition to showing that organism-specific training outperforms heterogeneous and hybrid training, this work shows that adding unrelated data to organism-specific training sets decreases the generalisation performance of the resulting model when tasked with predicting epitopes for the target pathogen.…”
Section: Discussionmentioning
confidence: 93%
“…Epitope prediction models were developed by training Random Forest (RF) predictors on each of the training data sets outlined above, using Scikit-learn version 0.24.1 [48] under standard hyper-parameter values. The choice of Random Forest was based on preliminary experimentation, as documented in [45], and also to make this work more directly comparable with the results reported in that earlier one. The trained models were then used to generate predictions for the organism-specific hold-out data sets and prediction performance was assessed using multiple different performance measures, namely: Balanced Accuracy (BAL.ACC), Matthew's Correlation Coefficient (MCC), Area Under the Curve (AUC), Positive Predictive Value (PPV), Negative Predictive Value (NPV) and Sensitivity (SENS).…”
Section: Modelling and Performance Assessmentmentioning
confidence: 96%
See 3 more Smart Citations