Biological systems are highly organized and enormously coordinated maintaining greater complexity. The increment of secondary data
generation and progress of modern mining techniques provided us an opportunity to discover hidden intra and inter relations among these non
linear dataset. This will help in understanding the complex biological phenomenon with greater efficiency. In this paper we report comparative
classification of Pyruvate Dehydrogenase protein sequences from bacterial sources based on 28 different physicochemical parameters (such as
bulkiness, hydrophobicity, total positively and negatively charged residues, α helices, β strand etc.) and 20 type amino acid compositions.
Logistic, MLP (Multi Layer Perceptron), SMO (Sequential Minimal Optimization), RBFN (Radial Basis Function Network) and SL (simple
logistic) methods were compared in this study. MLP was found to be the best method with maximum average accuracy of 88.20%. Same dataset
was subjected for clustering using 2*2 grid of a two dimensional SOM (Self Organizing Maps). Clustering analysis revealed the proximity of the
unannotated sequences with the Mycobacterium and Synechococcus genus.