2021
DOI: 10.1021/acs.jcim.1c00099
|View full text |Cite
|
Sign up to set email alerts
|

PyPEF—An Integrated Framework for Data-Driven Protein Engineering

Abstract: Data-driven strategies are gaining increased attention in protein engineering due to recent advances in access to large experimental databanks of proteins, next-generation sequencing (NGS), high-throughput screening (HTS) methods, and the development of artificial intelligence algorithms. However, the reliable prediction of beneficial amino acid substitutions, their combination, and the effect on functional properties remain the most significant challenges in protein engineering, which is applied to develop pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
37
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 23 publications
(39 citation statements)
references
References 106 publications
2
37
0
Order By: Relevance
“…Thus, by applying Fourier Transforms, we can capture, to some extent, this spatial dependency. Furthermore, this property results beneficial for any property-based encoding strategy, as previously reported in Siedhoff et al (2021) , Cadet et al (2018b) , and Cosic (1994) . Based on the above, we propose the combination of our encoders together with the application of Fourier transforms in order to improve the performance of predictive models.…”
Section: Resultssupporting
confidence: 65%
See 2 more Smart Citations
“…Thus, by applying Fourier Transforms, we can capture, to some extent, this spatial dependency. Furthermore, this property results beneficial for any property-based encoding strategy, as previously reported in Siedhoff et al (2021) , Cadet et al (2018b) , and Cosic (1994) . Based on the above, we propose the combination of our encoders together with the application of Fourier transforms in order to improve the performance of predictive models.…”
Section: Resultssupporting
confidence: 65%
“…Unsurprisingly, encoders play a fundamental role in the quality of the outcome of predictive models ( Yang et al, 2019 ; Wittmann et al, 2021 ). However, while there is a wide variety of encoding techniques, there is no general agreement on which one to select for a specific task ( Yang et al, 2018 ; Siedhoff et al, 2021 ). The first encoding approaches represented amino acid sequences in discrete manner (numeric-wise), using techniques such as One Hot or Ordinal Encoder ( Winter 1998 ; Pavelka et al, 2009 ; Brownlee, 2020 ).…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Data-driven recombination of beneficial substitutions, including model training (parameter optimization), validation, and prediction, was performed using the data-driven protein engineering framework PyPEF (Siedhoff et al, 2021). This framework performs sequence-based model training using diverse machine learning algorithms available from the Scikitlearn Python package (Pedregosa et al, 2011).…”
Section: Methodsmentioning
confidence: 99%
“…Recently, machine learning (ML) has been used to accelerate directed evolution of proteins. In this method, saturation mutagenesis and/or random mutagenesis are performed to generate an initial library. The variants in the library are experimentally evaluated to obtain their sequences and functions and then used as training data to construct a ML model that predicts the function from the sequence.…”
Section: Introductionmentioning
confidence: 99%