Background and objective As a common chronic disease, diabetes is called the “second killer” among modern diseases. Currently, there is no medical cure for diabetes. We can only rely on medication for auxiliary treatment. However, many diabetic patients still die each year. In addition, a considerable number of people do not pay attention to their physical health or opt out of treatment due to lack of money, which eventually leads to various complications. Therefore, diagnosing diabetes at an early stage and intervening early is necessary; thus, developing an early detection method for diabetes is essential. Methods In this study, a diabetes prediction model based on Boruta feature selection and ensemble learning is proposed. The model contains the use of Boruta feature selection, the extraction of salient features from datasets, the use of the K-Means++ algorithm for unsupervised clustering of data and stacking of an ensemble learning method for classification. It has been validated on a diabetes dataset. Results The experiments were performed on the PIMA Indian diabetes dataset. The model was evaluated by accuracy, precision and F1 index. The obtained results show that the accuracy rate of the model reaches 98% and achieves good results. Conclusion Compared with other diabetes prediction models, this model achieved better results, and the obtained results indicate that this model is superior to other models in diabetes prediction and has better performance.
encoding the association between protein sequence and three dimensional structure for a small heterologous training set of small proteins. In the present study, we report the application of this approach to a selected homologous training set of 8 proteins using the Cray 2 supercompter at the Minnesota Supercomputer Center, Minneapolis USA. The large memory of this machine allowed us to configure a network with more than .3 million connections and 30.000 neural units; a network of this size was necessary to accommodate a new training/testing set with 8 proteins of up to 140 amino acid residues. This training set was constructed to investigate the performance of the neural network approach in prediction of structures within the protease class of proteins; proteases are enzymes which cleave the peptide bonds which join individual amino acid residues of other proteins. The network learned the sequence-structure association for 4 of the proteins within 100 iterations selected in a random order and shifted by a random offset to the left or to the right. When presented with novel sequences from related proteins, the network was able to predict three dimensional structures of the four proteins in the testing set. The results of this study suggest that a neural network trained to recognize the entire sequence of a protein using the shift-learn method can retain some of the rules of protein folding in a form which allows prediction of three dimensional structures. Our findings indicate that large scalar or vector supercomputer architectures are ideal for implementation of useful backpropagation neural networks.
Text multi-label classification technology can accurately and quickly classify text information into related categories or topics, and help people quickly locate the required content in massive information resources, which is of great significance in application. As the traditional classification algorithm is faced with the problems of low classification accuracy due to the low correlation of data labels, unbalanced label data and few short text feature words, this paper firstly performs hierarchical pre-processing on label data to transform multi-label classification into hierarchical text multi-classification. At the same time, an improved multi-label classification algorithm Multi-label Convolutional Neural Networks (ML-CNN) is proposed. Based on the TensorFlow framework, a CNN model is designed and different training models are constructed for each level of label classification. According to the number of classification levels, the output of the upper level label is stitched to the original input tail as the next level of input. Experiments on the description information of 500,000 Chinese products with labels, show that the improved algorithm will significantly improve the classification accuracy and the accuracy of each level can reach more than 88%, which proves the feasibility and effectiveness of the algorithm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.