The development of rational techniques to discover new proteins for use in variety of applications ranging from agriculture to biotechnology remains an outstanding materials design problem. The key barrier is to design a sequence to fold into a predictable structure to achieve a certain material function. Focused on alpha-helical proteins, we report a Multi-scale Neighborhood-based Neural Network (MNNN) model to learn how a specific amino acid sequence folds into a protein structure. The algorithm predicts the protein structure without using a template or co-evolutional information at a maximum error of 2.1 Å. We find that the prediction accuracy is higher than other models and the prediction consumes less than six orders of magnitude time than ab initio folding methods. We demonstrate that MNNN can predict the structure of an unknown protein that agrees with experiments, and our model hence shows a great advantage in the rational design of de novo proteins.
Automated planning is one of the foundational areas of AI. Since no single planner can work well for all tasks and domains, portfolio-based techniques have become increasingly popular in recent years. In particular, deep learning emerges as a promising methodology for online planner selection. Owing to the recent development of structural graph representations of planning tasks, we propose a graph neural network (GNN) approach to selecting candidate planners. GNNs are advantageous over a straightforward alternative, the convolutional neural networks, in that they are invariant to node permutations and that they incorporate node labels for better inference.Additionally, for cost-optimal planning, we propose a two-stage adaptive scheduling method to further improve the likelihood that a given task is solved in time. The scheduler may switch at halftime to a different planner, conditioned on the observed performance of the first one. Experimental results validate the effectiveness of the proposed method against strong baselines, both deep learning and non-deep learning based.The code is available at https://github.com/matenure/GNN_planner.
We report an artificial intelligence (AI) based method to predict the molecular structure of proteins, focused here on an important subclass of proteins dominated by alpha-helix secondary structure, as found in many structural biomaterials such as keratin and membrane proteins. Fast yet accurate predictions of an unknown protein's 3D all-atom structure can yield a pre-screened set of candidate proteins to be investigated further via large-scale protein expression in bacteria or yeast. However, classical molecular simulations are greatly limited by the time scale and significant computational cost needed for the complete folding of a long peptide into a complex structure from scratch, which can easily exceed the capability of a supercomputer. To accelerate simulations at low computational cost here we report an innovative machine learning method to offer a high-throughput prediction of the protein structure, as well as the material and biological functions from purely the protein sequences. To achieve this, we designed a novel Multi-scale Neighborhood-based Neural Network (MNNN) model that is capable of learning the neighborhood structured information in the raw protein sequence trained on the database of over 120,000 protein structures. The method directly predicts the phi-psi dihedral angles of the backbone of each constituting amino acid, which is then used to construct the full all-atom 3D structure of the corresponding protein without any template or co-evolutional information. We find that our machine learning model can accurately predict all dihedral angles of any target sequence. The prediction yields a maximum average error of 2.1 Å of the predicted 3D structure compared with experimental measurement. We find that the predicted folded structure from MNNN consumes less than six orders of magnitude time than classical molecular dynamics simulations, offering extremely fast folding predictions. Our results suggest that the MNNN model can be used to greatly accelerate the prediction of protein structures.
Semantic parsing is a fundamental problem in natural language understanding, as it involves the mapping of natural language to structured forms such as executable queries or logic-like knowledge representations. Existing deep learning approaches for semantic parsing have shown promise on a variety of benchmark data sets, particularly on textto-SQL parsing. However, most text-to-SQL parsers do not generalize to unseen data sets in different domains. In this paper, we propose a new cross-domain learning scheme to perform text-to-SQL translation and demonstrate its use on Spider, a large-scale cross-domain text-to-SQL data set. We improve upon a state-of-the-art Spider model, SyntaxSQLNet, by constructing a graph of column names for all databases and using graph neural networks to compute their embeddings. The resulting embeddings offer better cross-domain representations and SQL queries, as evidenced by substantial improvement on the Spider data set compared to SyntaxSQLNet.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.