Most antimicrobial peptides (AMPs) and anticancer peptides (ACPs) fold into membrane disruptive cationic amphiphilic αhelices, many of which are however also unpredictably hemolytic and toxic. Here we exploited the ability of recurrent neural networks (RNN) to distinguish active from inactive and non-hemolytic from hemolytic AMPs and ACPs to discover new non-hemolytic ACPs. Our discovery pipeline involved: 1) sequence generation using either a generative RNN or a genetic algorithm, 2) RNN classification for activity and hemolysis, 3) selection for sequence novelty, helicity and amphiphilicity, and 4) synthesis and testing. Experimental evaluation of thirty-three peptides resulted in eleven active ACPs, four of which were non-hemolytic, with properties resembling those of the natural ACP lasioglossin III. These experiments show the first example of direct machine learning guided discovery of non-hemolytic ACPs.
Chemical space maps help visualize similarities within molecular sets. However, there are many different molecular similarity measures resulting in a confusing number of possible comparisons. To overcome this limitation, we exploit the fact that tools designed for reaction informatics also work for alchemical processes that do not obey Lavoisiers principle, such as the transmutation of lead into gold. We start by using the differential reaction fingerprint (DRFP) to create tree-maps (TMAPs) representing the chemical space of pairs of drugs selected as being similar according to various molecular fingerprints. We then use the Transformer-based RXNMapper model to understand structural relationships between drugs, and its confidence score to distinguish between pairs related by chemically feasible transformations and pairs related by alchemical transmutations. This analysis reveals a diversity of structural similarity relationships that are otherwise difficult to analyze simultaneously. We exemplify this approach by visualizing FDA-approved drugs, EGFR inhibitors, and polymyxin B analogs.
Antimicrobial peptides (AMPs) have gained significant attention in the field of drug discovery due to their potential therapeutic applications in the fight against antimicrobial resistance. Since rationally designing AMPs is notoriously difficult due to the vast number of possible peptide sequences and their complex structure-activity relationship landscape, this problem is ideally suited for machine-learning models, which can be trained from available data to predict new sequences with a desired activity profile. Here we investigated the performance of large language models (LLMs) fine-tuned with data from Database of Antimicrobial Activity and Structure of Peptides (DBAASP) to predict AMP antimicrobial activity and hemolysis from their amino acid sequence. We show that GPT-3 based models perform slightly better than previously reported recurrent neural networks (RNN) and related architectures on comparable datasets. Furthermore, GPT-3 based models perform remarkably well on low data regime. Advantages in terms of training time and costs are also discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.