MLPACK is a state-of-the-art, scalable, multi-platform C++ machine learning library released in late 2011 offering both a simple, consistent API accessible to novice users and high performance and flexibility to expert users by leveraging modern features of C++. ML-PACK provides cutting-edge algorithms whose benchmarks exhibit far better performance than other leading machine learning libraries. MLPACK version 1.0.3, licensed under the LGPL, is available at http://www.mlpack.org.
Background: Protein structure determination using X-ray free-electron laser (XFEL) includes analysis and merging a large number of snapshot diffraction patterns. Convolutional neural networks are widely used to solve numerous computer vision problems, e.g. image classification, and can be used for diffraction pattern analysis. But the task of protein structure determination with the use of CNNs only is not yet solved. Methods: We collected a number of predominantly alpha-helical protein structures from PDB and analyzed their geometry. Relatively straight helices were left unchanged while curved ones were split into helices of smaller length. Finally, 88 two-helical protein structures were selected with the length of helices from 5 to 38 residues (7 to 57Å). For every structure radii, lengths and relative position and orientation of helices were calculated. Diffraction patterns were calculated by means of straight modeling. Every structure was approximated as a pair of cylinders of given length and radius and then its diffraction image was calculated with the explicit formula: , where I(R) is intensity generated on the point of detector with radius-vector R, V is the volume of structure, A 0 is an amplitude of w-ray wave, k 0 is a vector of initial wave, k is a vector of scattered wave. The obtained collection of diffraction patterns was used to train and test the convolutional neural network (CNN). A number of convolutional layers is used to extract features from input images. Then, a dense layer is used to solve a multi-class classification problem. In order to obtain learnable parameters, we have to solve the minimization problem of the cross-entropy loss function. Results: Preliminary length and radius of helices with given sequence could be obtained from molecular modeling. Taking this into account, our model demonstrates a possibility to classify helix pairs into up to 50 disjoint classes. Conclusion: CNNs could be successively used for the purpose of classification of two-helical idealized protein structures. This could be used for preliminary analysis of protein conformation. Our further efforts will be directed towards enlarging the number of classes and expanding our approach to more complex structures.
Background: X-ray free-electron lasers (XFELs) in structural biology field opened a new opportunity for studying the structural dynamics of biomolecules. Structural dynamics analysis of single protein molecules by XFEL is based on the "diffraction-before-destruction" principle. The obtaining of the diffraction without sample crystallization is a complicated. Despite this fact, there are works about studying of single biomolecules without crystallization by XFEL. Methods: We have analyzed some works from the Coherent X-ray Imaging Data Bank (http://www.cxidb.org). Among all presented structures the works, that contained biological object and did not include crystallization methods for its preparing, were selected. Only the following methods Single Particle Flash X-ray Imaging (SP-XDI), Serial Fiber Diffraction (SFD), Fluctuation X-ray Scattering (FXS) met the chosen criteria. Studies with different objects were considered as different works. Results: In total, 20 non-recurring studies about studying of various biological objects without usage of crystallization were found. Among them there were two works with usage of FXS method, three works about SFD method and the remaining 15 workswith SP-XDI method. The works were also classified by objects: three studies were about cell structure of yeast and cyanobacteria; in two cases the structures of fibrils (bombesin and endorphin) were investigated by SFD method; in two other papers-the structures of protein complexes (RNA Polymerase II and Carboxysomes); and in the remaining 13 articles the viruses of various size were studied. Based on the analysis of works the potential problems of this field were elucidated, such as processing of diffraction patterns and 3D-reconstruction of structures, preparation of small size objects (single proteins less than 500 kDa), and low ratio between successful and error hits. Conclusion: Currently, the field of structural biology has gained necessary conditions for the analysis of single biological molecules. It was noted in the recent study (Pietrini, 2018), in case of solution of such problems as reducing electrospray aerosol droplet size, will allow to get the 5 Å resolution. Thus, it makes the problem of 3D-reconstruction from XFEL diffraction patterns is extremely important area for future studies.
Background: Protein structure determination using X-ray free-electron laser (XFEL) includes analysis and merging a large number of snapshot diffraction patterns. Convolutional neural networks are widely used to solve numerous computer vision problems, e.g. image classification, and can be used for diffraction pattern analysis. But the task of protein structure determination with the use of CNNs only is not yet solved. Methods: We simulated the diffraction patterns using the Condor software library and obtained more than 1000 diffraction patterns for each structure with simulation parameters resembling real ones. To classify diffraction patterns, we tried two approaches, which are widely known in the area of image classification: a classic VGG network and residual networks. Results: 1. Recognition of a protein class (GPCRs vs globins). Globins and GPCR-like proteins are typical α-helical proteins. Each of these protein families has a large number of representatives (including those with known structure) but we used only 8 structures from every family. 12,000 of diffraction patterns were used for training and 4,000 patterns for testing. Results indicate that all considered networks are able to recognize the protein family type with high accuracy. 2. Recognition of the number of protein molecules in the liposome. We considered the usage of lyposomes as carriers of membrane or globular proteins for sample delivery in XFEL experiments in order to improve the X-ray beam hit rate. Three sets of diffractograms for liposomes of various radius were calculated, including diffractograms for empty liposomes, liposomes loaded with 5 bacteriorhodopsin molecules, and liposomes loaded with 10 bacteriorhodopsin molecules. The training set consisted of 23625 diffraction patterns, and test set of 7875 patterns. We found that all networks used in our study were able to identify the number of protein molecules in liposomes independent of the liposome radius. Our findings make this approach rather promising for the usage of liposomes as protein carriers in XFEL experiments. Conclusion: Thus, the performed numerical experiments show that the use of neural network algorithms for the recognition of diffraction images from single macromolecular particles makes it possible to determine changes in the structure at the angstrom scale.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.