The suitability of decision trees in comparison to support vector machines for the classification of chemical compounds into drugs and nondrugs was investigated. To account for the requirements upon screening virtual compound libraries, schemes for successive filtering steps with gradual increasing computational cost are outlined. The obtained prediction accuracy was similar between decision trees and support vector machine approaches for the applied compound data sets. By using rapidly computable variables such as druglikeness indices, XlogP, and the molar refractivity, at least 39% of the nondrugs can be filtered out, while retaining more than 83% of the actual drugs. Computationally more demanding descriptors such as specific substructure queries and quantum chemically derived variables can be postponed to subsequent classification schemes for the reduced set of compounds, whereby up to 92% of the nondrugs can be sorted out without loosing considerably more drugs. Using all available computed descriptors simultaneously in the first step did not yield significantly better results. Furthermore, the generated decision trees are used to derive guidelines for the design of druglike substances. The numerical margins found at the branching points suggest several criteria that separate drugs from nondrugs: a molecular weight higher than 230, a molar refractivity higher than 40, and the presence of one or more rings as well as one or more functional groups. Also reported are additionally required parameters to compute values for XlogP, SlogP, and the molar refractivity of boron and silicon containing compounds.
Background: Chronic obstructive pulmonary disease (COPD) is a respiratory inflammatory condition with autoimmune features including IgG autoantibodies. In this study we analyze the complexity of the autoantibody response and reveal the nature of the antigens that are recognized by autoantibodies in COPD patients.
Seroreactivity profiling emerges as valuable technique for minimal invasive cancer detection. Recently, we provided first evidence for the applicability of serum profiling of glioma using a limited number of immunogenic antigens. Here, we screened 57 glioma and 60 healthy sera for autoantibodies against 1827 Escherichia coli expressed clones, including 509 in-frame peptide sequences. By a linear support vector machine approach, we calculated mean specificity, sensitivity, and accuracy of 100 repetitive classifications. We were able to differentiate glioma sera from sera of the healthy controls with a specificity of 90.28%, a sensitivity of 87.31% and an accuracy of 88.84%. We were also able to differentiate World Health Organization grade IV glioma sera from healthy sera with a specificity of 98.45%, a sensitivity of 80.93%, and an accuracy of 92.88%. To rank the antigens according to their information content, we computed the area under the receiver operator characteristic curve value for each clone. Altogether, we found 46 immunogenic clones including 16 in-frame clones that were informative for the classification of glioma sera versus healthy sera. For the separation of glioblastoma versus healthy sera, we found 91 informative clones including 26 in-frame clones. The best-suited in-frame clone for the classification glioma sera versus healthy sera corresponded to the vimentin gene (VIM) that was previously associated with glioma. In the future, autoantibody signatures in glioma not only may prove useful for diagnosis but also offer the prospect for a personalized immune-based therapy.
BackgroundLung cancer is a very frequent and lethal tumor with an identifiable risk population. Cytological analysis and chest X-ray failed to reduce mortality, and CT screenings are still controversially discussed. Recent studies provided first evidence for the potential usefulness of autoantigens as markers for lung cancer.MethodsWe used extended panels of arrayed antigens and determined autoantibody signatures of sera from patients with different kinds of lung cancer, different common non-tumor lung pathologies, and controls without any lung disease by a newly developed computer aided image analysis procedure. The resulting signatures were classified using linear kernel Support Vector Machines and 10-fold cross-validation.ResultsThe novel approach allowed for discriminating lung cancer patients from controls without any lung disease with a specificity of 97.0%, a sensitivity of 97.9%, and an accuracy of 97.6%. The classification of stage IA/IB tumors and controls yielded a specificity of 97.6%, a sensitivity of 75.9%, and an accuracy of 92.9%. The discrimination of lung cancer patients from patients with non-tumor lung pathologies reached an accuracy of 88.5%.ConclusionWe were able to separate lung cancer patients from subjects without any lung disease with high accuracy. Furthermore, lung cancer patients could be seprated from patients with other non-tumor lung diseases. These results provide clear evidence that blood-based tests open new avenues for the early diagnosis of lung cancer.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.