2019
DOI: 10.3390/ijms20225640
|View full text |Cite
|
Sign up to set email alerts
|

Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship Study

Abstract: The work aiming to unravel the correlation between protein sequence and function in the absence of structural information can be highly rewarding. We present a new way of considering descriptors from the amino acids index database for modeling and predicting the fitness value of a polypeptide chain. This approach includes the following steps: (i) Calculating Q elementary numerical sequences (Ele_SEQ) depending on the encoding of the amino acid residues, (ii) determining an extended numerical sequence (Ext_SEQ)… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
11
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 9 publications
(15 citation statements)
references
References 47 publications
0
11
0
Order By: Relevance
“…48,72 A structureindependent mutant library screening machine learning approach termed innov'SAR appeared recently. 73 The innov'SAR 37,41,74,75 pipeline applies featurization by Fouriertransforming numerical indices, which represent physicochemical and biochemical properties for each amino acid, taken from the amino acid index (AAindex) database. 76,77 After fast Fourier transform (FFT) processing, a spectral form of the protein is generated and used as the input for subsequent statistical modeling.…”
Section: Introductionmentioning
confidence: 99%
“…48,72 A structureindependent mutant library screening machine learning approach termed innov'SAR appeared recently. 73 The innov'SAR 37,41,74,75 pipeline applies featurization by Fouriertransforming numerical indices, which represent physicochemical and biochemical properties for each amino acid, taken from the amino acid index (AAindex) database. 76,77 After fast Fourier transform (FFT) processing, a spectral form of the protein is generated and used as the input for subsequent statistical modeling.…”
Section: Introductionmentioning
confidence: 99%
“…The ML procedure relies on the encoding phase, the modelling phase comprising a digital signal process (Fourier Transform), and the predictive phase. All steps from data encoding to model building with the implementation of the whole machine learning procedure, and model evaluation have been described in detail in previous papers and in experimental application case studies [7,[15][16] including multi-parameter optimization. [26] The new descriptors and the ML approach are fully described in the "Material and Methods" section of supporting information.…”
Section: Machine Learning Design and Screeningmentioning
confidence: 99%
“…[7] Indeed, a concatenation of multiples indices, that is, an extended_sequence (Ext_SEQ), is evaluated as descriptor (as exemplified in Figure 1). In the most recent contribution, [16] we showed that the use of multiple physicochemical indices coupled with the implementation of the FFT, taking into account the interactions between residues of amino acids within the protein sequence, [7,17] leads to very significant improvement in the quality of models. The choice of the descriptor (i. e., combination of indices) with or without applying FFT, during the statistical modelling, is dependent of the couple protein/fitness, thereby improving the prediction of enzyme activity.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…There have been many different methods for transforming the amino acids to numerical values. The mostly well‐employed methods in the literature include the binary encoding approach, 19 the descriptor encoding approach 20–22 and the profile encoding approach, 6,23 to name a few.…”
Section: Introductionmentioning
confidence: 99%