2020
DOI: 10.22541/au.159405844.42929954
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Identifying DNA-binding proteins based on multi-features fusion and LASSO feature selection

Abstract: DNA-binding proteins, performing an indispensable function in the maintenance of genetic information and holding significances for biomedical research, are inefficiently identified by traditional experimental methods due to their huge quantities. On the contrary, the machine learning method as an emerging technique demonstrates satisfactory speed and decent accuracy. Thus, this work focuses on extracting four different features from primary and secondary sequence features, i.e., RS, PseAACS, PSSM-ACCT and PSSM… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 10 publications
(13 citation statements)
references
References 0 publications
0
13
0
Order By: Relevance
“…This method not only helps ensure the characteristic vector size for protein sequences but also improves the efficiency of the subsequent machine learning model. There is also an auto-cross covariance transform and discrete wavelet transform used by Zhang et al 13 Due to the rather large PSSM data with a size of L × 20 ( L is the protein chain length), Chen et al 15 proposed an secondary structure element (SSE)-PSSM method that not only reduces the size of the feature vector but also improves performance of predicting the secondary structure of proteins with the most important modification being the SSE transformation. After searching for strings that are similar to the query string, they first converted the strings to an SSE form and then computed the position propensity matrix and finally the PSSM.…”
Section: Methodsmentioning
confidence: 99%
“…This method not only helps ensure the characteristic vector size for protein sequences but also improves the efficiency of the subsequent machine learning model. There is also an auto-cross covariance transform and discrete wavelet transform used by Zhang et al 13 Due to the rather large PSSM data with a size of L × 20 ( L is the protein chain length), Chen et al 15 proposed an secondary structure element (SSE)-PSSM method that not only reduces the size of the feature vector but also improves performance of predicting the secondary structure of proteins with the most important modification being the SSE transformation. After searching for strings that are similar to the query string, they first converted the strings to an SSE form and then computed the position propensity matrix and finally the PSSM.…”
Section: Methodsmentioning
confidence: 99%
“…(2016) explored the features of the popular regression methods, OLS regression, ridge regression and the LASSO regression. Zhang et al (2021) . (2021) used LASSO dimensionality reduction method to conduct experiments on the combination of feature sub models to obtain the best top-level feature number, thus providing support for the effective prediction of DNA binding proteins.…”
Section: Related Workmentioning
confidence: 99%
“…e feature subsets generated by FS methods for classification have two main goals, which are maximizing the classification accuracy (minimizing the classification error) and minimizing the number of selected features. As a mainstream classifier, K nearest neighbor (KNN) [56][57][58] is utilized for FS due to its advantages of simplicity and insensitivity to noisy data. Furthermore, how to reduce the number of selected features is considered another core issue.…”
Section: E Definition Of Objective Functionmentioning
confidence: 99%