2015
DOI: 10.1109/tnb.2015.2431292
|View full text |Cite
|
Sign up to set email alerts
|

Imbalanced Protein Data Classification Using Ensemble FTM-SVM

Abstract: Classification of protein sequences into functional and structural families based on machine learning methods is a hot research topic in machine learning and Bioinformatics. In fact, the underlying protein classification problem is a huge multiclass problem. Generally, the multiclass problem can be reduced to a set of binary classification problems. The protein in one class are seen as positive examples while those outside the class are seen as negative examples. However, the class imbalance problem will arise… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
11
0

Year Published

2016
2016
2021
2021

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 17 publications
(11 citation statements)
references
References 59 publications
0
11
0
Order By: Relevance
“…This was partly due to the imbalanced training sets with the numbers of non-members greatly surpassing those of the members. Imbalanced training sets were known to adversely affect the machine learning prediction performance, particularly the minority class [ 48 , 49 ]. Moreover, not all functional families were sufficiently covered by the known proteins, particularly those with < 100 known protein members, the inadequate coverage of the respective training sets likely affect SE s to varying degrees.…”
Section: Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…This was partly due to the imbalanced training sets with the numbers of non-members greatly surpassing those of the members. Imbalanced training sets were known to adversely affect the machine learning prediction performance, particularly the minority class [ 48 , 49 ]. Moreover, not all functional families were sufficiently covered by the known proteins, particularly those with < 100 known protein members, the inadequate coverage of the respective training sets likely affect SE s to varying degrees.…”
Section: Resultsmentioning
confidence: 99%
“…One useful strategy for overcoming the imbalanced datasets problem is to re-construct the training sets into more balanced ones by either over sampling the minority class [ 48 ] or under sampling the majority one [ 49 ], which might compromise the training datasets by introducing noises to the minority class or reducing the diversity of the majority one. In SVM-Prot, the training sets of the non-members were constructed from the minimal set of representative proteins from the Pfam domain families.…”
Section: Resultsmentioning
confidence: 99%
“…K-means clustering algorithm is applied on books circulation records to analyze readers' characteristics [4]. Artificial neural network techniques are applied for library management models [5]. A neural network based assessment method is developed and applied to assess college library website [6].…”
Section: Related Workmentioning
confidence: 99%
“…Classification of imbalanced protein data by using ensemble classifier technique EnFTM-SVM is proposed in [5]. This is an ensemble of fuzzy total margin support vector machine (FTN-SVM).…”
Section: Related Workmentioning
confidence: 99%
“…Here we have applied SVM as a tool to provide classification of endemic disease potential areas. Recently support vector machines, developed by Vapnik [20], have been used for a range of problems including pattern recognition [21,22], bioinformatics [23,24], and text categorization [25,26]. The use of classification in this facet and in medical diagnosis has been gradually increasing [27][28][29][30].…”
Section: Classification Of Regions With Endemic Diseases Based On Tramentioning
confidence: 99%