Predicting protein-ligand interactions using artificial intelligence (AI) models has attracted great interest in recent years. However, data-driven AI models unequivocally suffer from a lack of sufficiently large and unbiased datasets. Here, we systematically investigated the data biases on the PDBbind and DUD-E datasets. We examined the model performance of atomic convolutional neural network (ACNN) on the PDBbind core set and achieved a Pearson R 2 of 0.73 between experimental and predicted binding affinities. Strikingly, the ACNN models did not require learning the essential protein-ligand interactions in complex structures and achieved similar performance even on datasets containing only ligand structures or only protein structures, while data splitting based on similarity clustering (protein sequence or ligand scaffold) significantly reduced the model performance. We also identified the property and topology biases in the DUD-E dataset which led to the artificially increased enrichment performance of virtual screening. The property bias in DUD-E was reduced by enforcing the more stringent ligand property matching rules, while the topology bias still exists due to the use of molecular fingerprint similarity as a decoy selection criterion. Therefore, we believe that sufficiently large and unbiased datasets are desirable for training robust AI models to accurately predict protein-ligand interactions.
Fault diagnosis is of great importance to the rapid restoration of power systems. Many techniques have been employed to solve this problem. In this paper, a novel Genetic Algorithm (GA) based neural network for fault diagnosis in power systems is suggested, which adopts threelayer feed-forward neural network. Dual GA loops are applied in order to optimize the neural network topology and the connection weights. The f M GA-loop is for structure optimization and the second one for connection weight optimization. Jointly they search the global optimal neural network solution for fault diagnosis. The formulation and the corresponding computer flow chart are presented in detail in the paper. Computer test results in a test power system indicate that the proposed GAbased neural network fault diagnosis system works well and is superior as compared with the conventional Back-Propagation (BP) neural network.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.