A Novel Technique of Feature Extraction with Dual Similarity Measures for Protein Sequence Classification

Bharill, Neha; Tiwari, Aruna; Rawat, Anshul

doi:10.1016/j.procs.2015.04.217

Cited by 8 publications

(15 citation statements)

References 9 publications

(21 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is usually circumvented by aligning the sequences 26 or, in cases where aligning is impossible, with feature extraction, i.e. representing the sequences as a feature vector reflecting their properties [22][23][24][25][27][28][29] . Unfortunately, the resulting feature vectors are inherently biased by the method of feature extraction used 30 .…”

mentioning

confidence: 99%

AnOxPePred: using deep learning for the prediction of antioxidative properties of peptides

Olsen

Yeşiltaş

Marin

et al. 2020

Sci Rep

View full text Add to dashboard Cite

Dietary antioxidants are an important preservative in food and have been suggested to help in disease prevention. With consumer demands for less synthetic and safer additives in food products, the food industry is searching for antioxidants that can be marketed as natural. Peptides derived from natural proteins show promise, as they are generally regarded as safe and potentially contain other beneficial bioactivities. Antioxidative peptides are usually obtained by testing various peptides derived from hydrolysis of proteins by a selection of proteases. This slow and cumbersome trial-and-error approach to identify antioxidative peptides has increased interest in developing computational approaches for prediction of antioxidant activity and thereby reduce laboratory work. A few antioxidant predictors exist, however, no tool predicting the antioxidative properties of peptides is, to the best of our knowledge, currently available as a web-server. We here present the AnOxPePred tool and web-server (http://services.bioinformatics.dtu.dk/service.php?AnOxPePred-1.0) that uses deep learning to predict the antioxidant properties of peptides. Our model was trained on a curated dataset consisting of experimentally-tested antioxidant and non-antioxidant peptides. For a variety of metrics our method displays a prediction performance better than a k-NN sequence identity-based approach. Furthermore, the developed tool will be a good benchmark for future predictors of antioxidant peptides.

show abstract

mentioning

confidence: 99%

AnOxPePred: using deep learning for the prediction of antioxidative properties of peptides

Olsen

Yeşiltaş

Marin

et al. 2020

Sci Rep

View full text Add to dashboard Cite

show abstract

“…This implies that how these protein sequences can be represented in terms of feature vectors so that these feature vectors can be applied as an input to the clustering algorithm. For this purpose, we have used the encoding technique presented in [53] that entails the extraction of six features corresponding to each protein sequence.…”

Section: B Datasetsmentioning

confidence: 99%

A Generalized Enhanced Quantum Fuzzy Approach for Efficient Data Clustering

et al. 2019

Self Cite

View full text Add to dashboard Cite

Data clustering is a challenging task to gain insights into data in various fields. In this paper, an Enhanced Quantum-Inspired Evolutionary Fuzzy C-Means (EQIE-FCM) algorithm is proposed for data clustering. In the EQIE-FCM, quantum computing concept is utilized in combination with the FCM algorithm to improve the clustering process by evolving the clustering parameters. The improvement in the clustering process leads to improvement in the quality of clustering results. To validate the quality of clustering results achieved by the proposed EQIE-FCM approach, its performance is compared with the other quantum-based fuzzy clustering approaches and also with other evolutionary clustering approaches. To evaluate the performance of these approaches, extensive experiments are being carried out on various benchmark datasets and on the protein database that comprises of four superfamilies. The results indicate that the proposed EQIE-FCM approach finds the optimal value of fitness function and the fuzzifier parameter for the reported datasets. In addition to this, the proposed EQIE-FCM approach also finds the optimal number of clusters and more accurate location of initial cluster centers for these benchmark datasets. Thus, it can be regarded as a more efficient approach for data clustering. INDEX TERMS Clustering, quantum computing, evolutionary algorithm, fuzzy set theory, bioinformatics.

show abstract

“…Gupta et al [28] used the general version of Chou [29] pseudo amino acid composition, which is a sixtydimensional numerical feature vector of protein sequences, to develop an alignment-free approach for finding similarity across protein sequences. Bharill et al [30] developed an approach to extract six-dimensional numerical feature vectors from a protein sequence. Likewise, many feature extraction techniques [23,20,24,31,25,30,28] have been introduced in the past, but, none of them is scalable.…”

Section: Introductionmentioning

confidence: 99%

“…Bharill et al [30] developed an approach to extract six-dimensional numerical feature vectors from a protein sequence. Likewise, many feature extraction techniques [23,20,24,31,25,30,28] have been introduced in the past, but, none of them is scalable. A scalable approach for selecting statistically relevant characteristics from a large sequence is required.…”

Section: Introductionmentioning

confidence: 99%

A Novel Scalable Apache Spark Based Feature Extraction Approaches for Huge Protein Sequence and their Clustering Performance Analysis

Jha¹,

Tiwari²,

Bharill³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Genome sequencing projects are rapidly increasing the number of high-dimensional protein sequence datasets. Clustering a high-dimensional protein sequence dataset using traditional machine learning approaches poses many challenges. Many different feature extraction methods exist and are widely used. However, extracting features from millions of protein sequences becomes impractical because they are not scalable with current algorithms. Therefore, there is a need for an efficient feature extraction approach that extracts significant features.

show abstract

A Novel Technique of Feature Extraction with Dual Similarity Measures for Protein Sequence Classification

Cited by 8 publications

References 9 publications

AnOxPePred: using deep learning for the prediction of antioxidative properties of peptides

AnOxPePred: using deep learning for the prediction of antioxidative properties of peptides

A Generalized Enhanced Quantum Fuzzy Approach for Efficient Data Clustering

A Novel Scalable Apache Spark Based Feature Extraction Approaches for Huge Protein Sequence and their Clustering Performance Analysis

Contact Info

Product

Resources

About