In this work, we developed a general perturbation theory and machine learning method for data mining of proteomes to discover new B-cell epitopes useful for vaccine design. The method predicts the epitope activity ε(c) of one query peptide (q-peptide) under a set of experimental query conditions (c). The method uses as input the sequence of the q-peptide. The method also uses as input information about the sequence and epitope activity ε(c) of a peptide of reference (r-peptide) assayed under similar experimental conditions (c). The model proposed here is able to classify 1 048 190 pairs of query and reference peptide sequences from the proteome of many organisms reported on IEDB database. These pairs have variations (perturbations) under sequence or assay conditions. The model has accuracy, sensitivity, and specificity between 71 and 80% for training and external validation series. The retrieved information contains structural changes in 83 683 peptides sequences (Seq) determined in experimental assays with boundary conditions involving 1448 epitope organisms (Org), 323 host organisms (Host), 15 types of in vivo process (Proc), 28 experimental techniques (Tech), and 505 adjuvant additives (Adj). Afterward, we reported the experimental sampling, isolation, and sequencing of 15 complete sequences of Bm86 gene from state of Colima, Mexico. Last, we used the model to predict the epitope immunogenic scores under different experimental conditions for the 26 112 peptides obtained from these sequences. The model may become a useful tool for epitope selection toward vaccine design. The theoretical-experimental results on Bm86 protein may help the future design of a new vaccine based on this protein.
Quantitative Structure-Activity (mt-QSAR) techniques may become an important tool for prediction of cytotoxicity and High-throughput Screening (HTS) of drugs to rationalize drug discovery process. In this work, we train and validate by the first time mt-QSAR model using TOPS-MODE approach to calculate drug molecular descriptors and Linear Discriminant Analysis (LDA) function. This model correctly classifies 8258 out of 9000 (Accuracy = 91.76%) multiplexing assay endpoints of 7903 drugs (including both train and validation series). Each endpoint correspond to one out of 1418 assays, 36 molecular and cellular targets, 46 standard type measures, in two possible organisms (human and mouse). After that, we determined experimentally, by the first time, the values of EC50 = 21.58 μg/mL and Cytotoxicity = 23.6% for the anti-microbial/anti-parasite drug G1 over Balb/C mouse peritoneal macrophages using flow cytometry. In addition, the model predicts for G1 only 7 positive endpoints out 1251 cytotoxicity assays (0.56% of probability of cytotoxicity in multiple assays). The results obtained complement the toxicological studies of this important drug. This work adds a new tool to the existing pool of few methods useful for multi-target HTS of ChEMBL and other libraries of compounds towards drug discovery.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.