Protein Abundance Prediction Through Machine Learning Methods

Ferreira, Maurício Alexander de Moura; Ventorim, Rafaela Zandonade; Almeida, Eduardo L.; Silveira, Sabrina; Silveira, Wendel Batista da

doi:10.1101/2020.09.17.302182

Cited by 4 publications

(5 citation statements)

References 63 publications

(73 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Standard codon usage metrics were shown to be highly predictive of protein abundance. For instance, an AdaBoost model trained on a number of codon usage metrics in S. cerevisiae genes coding for high-abundance proteins (top 10%) and low-abundance proteins (lowest 10%) was highly predictive of these extremes of protein abundance ( R 2 = 0.95) ( Ferreira et al, 2020 ).…”

Section: Regulatory Mechanisms In Specific Coding and Non-coding Regionsmentioning

confidence: 99%

Learning the Regulatory Code of Gene Expression

Zrimec

Buric

Kokina

et al. 2021

Front. Mol. Biosci.

View full text Add to dashboard Cite

Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology.

show abstract

Section: Regulatory Mechanisms In Specific Coding and Non-coding Regionsmentioning

confidence: 99%

Learning the Regulatory Code of Gene Expression

Zrimec

Buric

Kokina

et al. 2021

Front. Mol. Biosci.

View full text Add to dashboard Cite

show abstract

“…Another approach, developed by Terai and Asai (2020), uses features such as the accessibility around the Shine-Dalgarno sequence, minimum free energy of the mRNA molecule, Viterbi score, and inside-outside score. Further, Ferreira et al (2021) explored codon usage bias information to train an AdaBoost regression model, achieving higher correlations than previous approaches without the usage of transcriptomics data.…”

Section: Parrot: Prediction Of Enzyme Abundances Using Protein-constr...mentioning

confidence: 99%

PARROT: Prediction of enzyme abundances using protein-constrained metabolic models

Ferreira

Silveira

Nikoloski

2022

Preprint

View full text Add to dashboard Cite

Motivation: Protein allocation determines activity of cellular pathways and affects growth across all organisms. Therefore, a variety of experimental and machine learning approaches has been developed to quantify and predict protein abundances, respectively. Yet, despite advances in protein quantification, it remains challenging to predict condition-specific allocation of enzymes in metabolic networks. Results: Here we propose a family of constrained-based approaches, termed PARROT, to predict enzyme allocations based on the principle of minimizing the enzyme allocation adjustment using protein constrained metabolic models. To this end, PARROT variants model the minimization of enzyme reallocation using four different (combinations of) distance functions. We demonstrate that the PARROT variant that minimizes the Manhattan distance of enzyme allocations outperforms existing approaches based on the parsimonious distribution of fluxes or enzymes for both Escherichia coli and Saccharomyces cerevisiae. Further, we show that the combined minimization of flux and enzyme allocation adjustment leads to poor and inconsistent predictions. Together, our findings indicate that minimization of resource rather than flux redistribution is a governing principle determining steady-state pathway activity for microorganism grown in suboptimal conditions. Availability and implementation: The implementation of PARROT can be found in the GitHub repository: https://github.com/mauricioamf/PARROT

show abstract

“…Moreover, issues may arise from errors in the installation, configuration, or use of 'competitor' frameworks. Typical examples are misunderstanding memory management and/or using insufficient compute resources (Balaji and Allen, 2018), or failing to use comparable resource budgets (Ferreira et al, 2021).…”

Section: The Need For Standardized Benchmarksmentioning

confidence: 99%

AMLB: an AutoML Benchmark

Gijsbers¹,

Bueno²,

Coors³

et al. 2022

Preprint

View full text Add to dashboard Cite

Comparing different AutoML frameworks is notoriously challenging and often done incorrectly. We introduce an open and extensible benchmark that follows best practices and avoids common mistakes when comparing AutoML frameworks. We conduct a thorough comparison of 9 well-known AutoML frameworks across 71 classification and 33 regression tasks. The differences between the AutoML frameworks are explored with a multi-faceted analysis, evaluating model accuracy, its trade-offs with inference time, and framework failures. We also use Bradley-Terry trees to discover subsets of tasks where the relative Au-toML framework rankings differ. The benchmark comes with an open-source tool that integrates with many AutoML frameworks and automates the empirical evaluation process end-to-end: from framework installation and resource allocation to in-depth evaluation. The benchmark uses public data sets, can be easily extended with other AutoML frameworks and tasks, and has a website with up-to-date results.

show abstract

Protein Abundance Prediction Through Machine Learning Methods

Cited by 4 publications

References 63 publications

Learning the Regulatory Code of Gene Expression

Learning the Regulatory Code of Gene Expression

PARROT: Prediction of enzyme abundances using protein-constrained metabolic models

AMLB: an AutoML Benchmark

Contact Info

Product

Resources

About