DNA-binding proteins (DBPs) play crucial roles in numerous cellular processes including nucleotide recognition, transcriptional control and the regulation of gene expression. Majority of the existing computational techniques for identifying DBPs are mainly applicable to human and mouse datasets. Even though some models have been tested on Arabidopsis, they produce poor accuracy when applied to other plant species. Therefore, it is imperative to develop an effective computational model for predicting plant DBPs. In this study, we developed a comprehensive computational model for plant specific DBPs identification. Five shallow learning and six deep learning models were initially used for prediction, where shallow learning methods outperformed deep learning algorithms. In particular, support vector machine achieved highest repeated 5-fold cross-validation accuracy of 94.0% area under receiver operating characteristic curve (AUC-ROC) and 93.5% area under precision recall curve (AUC-PR). With an independent dataset, the developed approach secured 93.8% AUC-ROC and 94.6% AUC-PR. While compared with the state-of-art existing tools by using an independent dataset, the proposed model achieved much higher accuracy. Overall results suggest that the developed computational model is more efficient and reliable as compared to the existing models for the prediction of DBPs in plants. For the convenience of the majority of experimental scientists, the developed prediction server PlDBPred is publicly accessible at https://iasri-sg.icar.gov.in/pldbpred/.The source code is also provided at https://iasri-sg.icar.gov.in/pldbpred/source_code.php for prediction using a large-size dataset.
RNA-binding proteins (RBPs) are essential for post-transcriptional gene regulation in eukaryotes, including splicing control, mRNA transport and decay. Thus, accurate identification of RBPs is important to understand gene expression and regulation of cell state. In order to detect RBPs, a number of computational models have been developed. These methods made use of datasets from several eukaryotic species, specifically from mice and humans. Although some models have been tested on Arabidopsis, these techniques fall short of correctly identifying RBPs for other plant species. Therefore, the development of a powerful computational model for identifying plant-specific RBPs is needed. In this study, we presented a novel computational model for locating RBPs in plants. Five deep learning models and ten shallow learning algorithms were utilized for prediction with 20 sequence-derived and 20 evolutionary feature sets. The highest repeated five-fold cross-validation accuracy, 91.24% AU-ROC and 91.91% AU-PRC, was achieved by light gradient boosting machine. While evaluated using an independent dataset, the developed approach achieved 94.00% AU-ROC and 94.50% AU-PRC. The proposed model achieved significantly higher accuracy for predicting plant-specific RBPs as compared to the currently available state-of-art RBP prediction models. Despite the fact that certain models have already been trained and assessed on the model organism Arabidopsis, this is the first comprehensive computer model for the discovery of plant-specific RBPs. The web server RBPLight was also developed, which is publicly accessible at https://iasri-sg.icar.gov.in/rbplight/, for the convenience of researchers to identify RBPs in plants.
Garib Kalyan Rojgar Abhiyaan (GKRA) initiative was announced by the government of India in June 2020 for a period of 125 days to create livelihood opportunities for the returned migrants in their states and to sustain rural development. GKRA was convergent effort between 12 different ministries/departments covering 25 activities over 116 districts of 6 states. As different stakeholders were involved, therefore for effective monitoring and data exchange an IT platform was created at central level (https://gkra.nic.in) where all departments exchanged data for consecutive 15 weeks. Indian Council of Agricultural Research (ICAR) organized skill development training programmes for livelihood through Krishi Vigyan Kendra (KVK). E-Governance of these training programmes and exchange of data to central GKRA portal was managed through a module developed during July-October 2020 under KVK Knowledge Network Portal (https://kvk.icar.gov.in) at ICAR-Indian Agricultural Statistics Research Institute and hosted at ICAR Data Centre, ICAR-IASRI, New Delhi. The emphasis of the training programmes was on the areas of integrated farming to support livelihood. Overall, the participants were satisfied with the quality of training programmes conducted by KVKs. KVKs also supported participants in handholding of technology and setting up their own venture. IT platforms provided efficient storing and retrieval of data, framework for exchange of information and enhanced visibility in the initiative.
The next generation sequencing (NGS) technology generates a large amount of genomic data. Bioinformatics has made it a priority to deal with genomic data in all of its forms. The main focus of this article is on the use of data mining in bioinformatics. The processing flow of bioinformatics data is depicted using a summary of the definition and relevant research contents of bioinformatics. Then emphatically introduces data mining from the perspective of data preprocessing, dimension reduction and statistical machine learning in bioinformatics. The use of data mining in the field of bioinformatics is discussed. It also discusses some of the current obstacles and prospects in bioinformatics data mining.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.