nifPred: Proteome-Wide Identification and Categorization of Nitrogen-Fixation Proteins of Diaztrophs Based on Composition-Transition-Distribution Features Using Support Vector Machine

Meher, Prabina Kumar; Sahu, Tanmaya Kumar; Mohanty, Jyotilipsa; Gahoi, Shachi; Purru, Supriya; Grover, Monendra; Rao, Atmakuri Ramakrishna

doi:10.3389/fmicb.2018.01100

Cited by 14 publications

(10 citation statements)

References 72 publications

(83 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One of the most important tasks in machine learning-based prediction using biological sequence data is to encode the sequences into numeric features, as machine learning algorithms (MLA) can only take numerical inputs [2,[77][78][79][80]. Further, the miRNA sequences are only 20-24 nucleotides long, which is also a limitation to generate large number of discriminative features.…”

Section: Discussionmentioning

confidence: 99%

ASRmiRNA: Abiotic Stress-Responsive miRNA Prediction in Plants by Using Machine Learning Algorithms with Pseudo K-Tuple Nucleotide Compositional Features

Meher

Begam

Sahu

et al. 2022

IJMS

View full text Add to dashboard Cite

MicroRNAs (miRNAs) play a significant role in plant response to different abiotic stresses. Thus, identification of abiotic stress-responsive miRNAs holds immense importance in crop breeding programmes to develop cultivars resistant to abiotic stresses. In this study, we developed a machine learning-based computational method for prediction of miRNAs associated with abiotic stresses. Three types of datasets were used for prediction, i.e., miRNA, Pre-miRNA, and Pre-miRNA + miRNA. The pseudo K-tuple nucleotide compositional features were generated for each sequence to transform the sequence data into numeric feature vectors. Support vector machine (SVM) was employed for prediction. The area under receiver operating characteristics curve (auROC) of 70.21, 69.71, 77.94 and area under precision-recall curve (auPRC) of 69.96, 65.64, 77.32 percentages were obtained for miRNA, Pre-miRNA, and Pre-miRNA + miRNA datasets, respectively. Overall prediction accuracies for the independent test set were 62.33, 64.85, 69.21 percentages, respectively, for the three datasets. The SVM also achieved higher accuracy than other learning methods such as random forest, extreme gradient boosting, and adaptive boosting. To implement our method with ease, an online prediction server “ASRmiRNA” has been developed. The proposed approach is believed to supplement the existing effort for identification of abiotic stress-responsive miRNAs and Pre-miRNAs.

show abstract

Section: Discussionmentioning

confidence: 99%

ASRmiRNA: Abiotic Stress-Responsive miRNA Prediction in Plants by Using Machine Learning Algorithms with Pseudo K-Tuple Nucleotide Compositional Features

Meher

Begam

Sahu

et al. 2022

IJMS

View full text Add to dashboard Cite

show abstract

“…nifPred, a multi NifH proteins classifier, that uses 13,500 values per sequence, involves four manual methods to obtain the components and data to be trained, having a high specificity. 14 NIFtHool was compared with two models that work with two different embedding vectors to identification of mitochondrial proteins of Plasmodium falciparum 25 and DNA-binding proteins. 23 The three models showed positive results in sensitivity, specificity, and accuracy, so the selection of the embedding vector method was adequate.…”

Section: Discussionmentioning

confidence: 99%

“…However, their investigations did not produce a computer tool. Meher et al (2018) developed nifPred, a machine learning (ML) software, to perform a sequence classification into NifH or non-NifH proteins. This informatics tool converts multi-categorical sort of gene sequences into one of the six types of the Nif proteins encoded by the nif operon using a high computational performance.…”

Section: Introductionmentioning

confidence: 99%

“…This informatics tool converts multi-categorical sort of gene sequences into one of the six types of the Nif proteins encoded by the nif operon using a high computational performance. 14 Constant supervision is necessary to guide the program in all phases of the system, which causes the increment of computational cost. 15 Nowadays, some algorithms are registered in literature to make predictions of NifH proteins based on gene sequences.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

NIFtHool: an informatics program for identification of NifH proteins using deep neural networks

et al. 2022

View full text Add to dashboard Cite

Atmospheric nitrogen fixation carried out by microorganisms has environmental and industrial importance, related to the increase of soil fertility and productivity. The present work proposes the development of a new high precision system that allows the recognition of amino acid sequences of the nitrogenase enzyme (NifH) as a promising way to improve the identification of diazotrophic bacteria. For this purpose, a database obtained from UniProt built a processed dataset formed by a set of 4911 and 4782 amino acid sequences of the NifH and non-NifH proteins respectively. Subsequently, the feature extraction was developed using two methodologies: (i) k-mers counting and (ii) embedding layers to obtain numerical vectors of the amino acid chains. Afterward, for the embedding layer, the data was crossed by an external trainable convolutional layer, which received a uniform matrix and applied convolution using filters to obtain the feature maps of the model. Finally, a deep neural network was used as the primary model to classify the amino acid sequences as NifH protein or not. Performance evaluation experiments were carried out, and the results revealed an accuracy of 96.4%, a sensitivity of 95.2%, and a specificity of 96.7%. Therefore, an amino acid sequence-based feature extraction method that uses a neural network to detect N-fixing organisms is proposed and implemented. NIFtHool is available from: https://nifthool.anvil.app/

show abstract

“…A study of 2000 complete genomes available in 2012 led to proposing NifHDKENB ( Figure 1 ) as the minimum criteria for computational prediction of diazotrophy [ 20 ]. This six gene criterion has been widely used as diagnostic for diazotrophy in culture-independent studies [ 21 , 22 , 23 , 24 ].…”

Section: Introductionmentioning

confidence: 99%

Phylogeny of Nitrogenase Structural and Assembly Components Reveals New Insights into the Origin and Distribution of Nitrogen Fixation across Bacteria and Archaea

Koirala

Brözel

2021

Microorganisms

View full text Add to dashboard Cite

The phylogeny of nitrogenase has only been analyzed using the structural proteins NifHDK. As nifHDKENB has been established as the minimum number of genes necessary for in silico prediction of diazotrophy, we present an updated phylogeny of diazotrophs using both structural (NifHDK) and cofactor assembly proteins (NifENB). Annotated Nif sequences were obtained from InterPro from 963 culture-derived genomes. Nif sequences were aligned individually and concatenated to form one NifHDKENB sequence. Phylogenies obtained using PhyML, FastTree, RapidNJ, and ASTRAL from individuals and concatenated protein sequences were compared and analyzed. All six genes were found across the Actinobacteria, Aquificae, Bacteroidetes, Chlorobi, Chloroflexi, Cyanobacteria, Deferribacteres, Firmicutes, Fusobacteria, Nitrospira, Proteobacteria, PVC group, and Spirochaetes, as well as the Euryarchaeota. The phylogenies of individual Nif proteins were very similar to the overall NifHDKENB phylogeny, indicating the assembly proteins have evolved together. Our higher resolution database upheld the three cluster phylogeny, but revealed undocumented horizontal gene transfers across phyla. Only 48% of the 325 genera containing all six nif genes are currently supported by biochemical evidence of diazotrophy. In addition, this work provides reference for any inter-phyla comparison of Nif sequences and a quality database of Nif proteins that can be used for identifying new Nif sequences.

show abstract

nifPred: Proteome-Wide Identification and Categorization of Nitrogen-Fixation Proteins of Diaztrophs Based on Composition-Transition-Distribution Features Using Support Vector Machine

Cited by 14 publications

References 72 publications

ASRmiRNA: Abiotic Stress-Responsive miRNA Prediction in Plants by Using Machine Learning Algorithms with Pseudo K-Tuple Nucleotide Compositional Features

ASRmiRNA: Abiotic Stress-Responsive miRNA Prediction in Plants by Using Machine Learning Algorithms with Pseudo K-Tuple Nucleotide Compositional Features

NIFtHool: an informatics program for identification of NifH proteins using deep neural networks

Phylogeny of Nitrogenase Structural and Assembly Components Reveals New Insights into the Origin and Distribution of Nitrogen Fixation across Bacteria and Archaea

Contact Info

Product

Resources

About