Artificial intelligence, in particular machine learning (ML), has emerged as a key promising pillar to overcome the high failure rate in drug development. Here, we present a primer on the ML algorithms most commonly used in drug discovery and development. We also list possible data sources, describe good practices for ML model development and validation, and share a reproducible example. A companion article will summarize applications of ML in drug discovery, drug development, and postapproval phase.
BackgroundThe recent determination of the complete nucleotide sequence of several Mycobacterium tuberculosis (MTB) genomes allows the use of comparative genomics as a tool for dissecting the nature and consequence of genetic variability within this species. The multiple alignment of the genomes of clinical strains (CDC1551, F11, Haarlem and C), along with the genomes of laboratory strains (H37Rv and H37Ra), provides new insights on the mechanisms of adaptation of this bacterium to the human host.FindingsThe genetic variation found in six M. tuberculosis strains does not involve significant genomic rearrangements. Most of the variation results from deletion and transposition events preferentially associated with insertion sequences and genes of the PE/PPE family but not with genes implicated in virulence. Using a Perl-based software islandsanalyser, which creates a representation of the genetic variation in the genome, we identified differences in the patterns of distribution and frequency of the polymorphisms across the genome. The identification of genes displaying strain-specific polymorphisms and the extrapolation of the number of strain-specific polymorphisms to an unlimited number of genomes indicates that the different strains contain a limited number of unique polymorphisms.ConclusionThe comparison of multiple genomes demonstrates that the M. tuberculosis genome is currently undergoing an active process of gene decay, analogous to the adaptation process of obligate bacterial symbionts. This observation opens new perspectives into the evolution and the understanding of the pathogenesis of this bacterium.
Chagas disease is a neglected tropical disease endemic to Latin America, though migratory movements have recently spread it to other regions. Here, we have applied a cascade virtual screening campaign combining ligand- and structure-based methods. In order to find novel inhibitors of putrescine uptake in Trypanosoma cruzi, an ensemble of linear ligand-based classifiers obtained by has been applied as initial screening filter, followed by docking into a homology model of the putrescine permease TcPAT12. 1,000 individual linear classifiers were inferred from a balanced dataset. Subsequently, different schemes were tested to combine the individual classifiers: MIN operator, average ranking, average score, average voting, with MIN operator leading to the best performance. The homology model was based on the arginine/agmatine antiporter (AdiC) from Escherichia coli as template. It showed 64% coverage of the entire query sequence and it was selected based on the normalized Discrete Optimized Protein Energy parameter and the GA341 score. The modeled structure had 96% in the allowed area of Ramachandran's plot, and none of the residues located in non-allowed regions were involved in the active site of the transporter. Positivity Predictive Value surfaces were applied to optimize the score thresholds to be used in the ligand-based virtual screening step: for that purpose Positivity Predictive Value was charted as a function of putative yields of active in the range 0.001–0.010 and the Se/Sp ratio. With a focus on drug repositioning opportunities, DrugBank and Sweetlead databases were subjected to screening. Among 8 hits, cinnarizine, a drug frequently prescribed for motion sickness and balance disorder, was tested against T. cruzi epimastigotes and amastigotes, confirming its trypanocidal effects and its inhibitory effects on putrescine uptake. Furthermore, clofazimine, an antibiotic with already proven trypanocidal effects, also displayed inhibitory effects on putrescine uptake. Two other hits, meclizine and butoconazole, also displayed trypanocidal effects (in the case of meclizine, against both epimastigotes and amastigotes), without inhibiting putrescine uptake.
Early clinical trials of therapies to treat Duchenne muscular dystrophy (DMD), a fatal genetic X-linked pediatric disease, have been designed based on the limited understanding of natural disease progression and variability in clinical measures over different stages of the continuum of the disease. The objective was to inform the design of DMD clinical trials by developing a disease progression modelbased clinical trial simulation (CTS) platform based on measures commonly used in DMD trials. Data were integrated from past studies through the Duchenne Regulatory Science Consortium founded by the Critical Path Institute (15 clinical trials and studies, 1505 subjects, 27,252 observations). Using a nonlinear mixedeffects modeling approach, longitudinal dynamics of five measures were modeled (NorthStar Ambulatory Assessment, forced vital capacity, and the velocities of the following three timed functional tests: time to stand from supine, time to climb 4 stairs, and 10 meter walk-run time). The models were validated on external data sets and captured longitudinal changes in the five measures well, including both early disease when function improves as a result of growth and development and the decline in function in later stages. The models can be used in the CTS platform to perform trial simulations to optimize the selection of inclusion/ exclusion criteria, selection of measures, and other trial parameters. The data sets and models have been reviewed by the US Food and Drug Administration and the European Medicines Agency; have been accepted into the Fit-for-Purpose and Qualification for Novel Methodologies pathways, respectively; and will be submitted for potential endorsement by both agencies.
Current medical treatments against recurrent pulmonary infections caused by Pseudomonas aeruginosa, such as cystic fibrosis (CF) disorder, involve the administration of inhalable antibiotics.
The Blood-Brain Barrier (BBB) is a physical and biochemical barrier that restricts the entry of certain drugs to the Central Nervous System (CNS), while allowing the passage of others. The ability to predict the permeability of a given molecule through the BBB is a key aspect in CNS drug discovery and development, since neurotherapeutic agents with molecular targets in the CNS should be able to cross the BBB, whereas peripherally acting agents should not, to minimize the risk of CNS adverse effects. In this review we examine and discuss QSAR approaches and current availability of experimental data for the construction of BBB permeability predictive models, focusing on the modeling of the biorelevant parameter unbound partitioning coefficient (Kp,uu). Emphasis is made on two possible strategies to overcome the current limitations of in silico models: considering the prediction of brain penetration as a multifactorial problem, and increasing experimental datasets through accurate and standardized experimental techniques.
Breast Cancer Resistance Protein (BCRP) is an ATP-dependent efflux transporter linked to the multidrug resistance phenomenon in many diseases such as epilepsy and cancer and a potential source of drug interactions. For these reasons, the early identification of substrates and nonsubstrates of this transporter during the drug discovery stage is of great interest. We have developed a computational nonlinear model ensemble based on conformational independent molecular descriptors using a combined strategy of genetic algorithms, J48 decision tree classifiers, and data fusion. The best model ensemble consists in averaging the ranking of the 12 decision trees that showed the best performance on the training set, which also demonstrated a good performance for the test set. It was experimentally validated using the ex vivo everted rat intestinal sac model. Five anticonvulsant drugs classified as nonsubstrates for BRCP by the model ensemble were experimentally evaluated, and none of them proved to be a BCRP substrate under the experimental conditions used, thus confirming the predictive ability of the model ensemble. The model ensemble reported here is a potentially valuable tool to be used as an in silico ADME filter in computer-aided drug discovery campaigns intended to overcome BCRP-mediated multidrug resistance issues and to prevent drug-drug interactions.
Alzheimer’s disease (AD) is the leading cause of dementia worldwide. With 35 million people over 60 years of age with dementia, there is an urgent need to develop new treatments for AD. To streamline this process, it is imperative to apply insights and learnings from past failures to future drug development programs. In the present work, we focus on how modeling and simulation tools can leverage open data to address drug development challenges in AD.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.