Paving the way to single-molecule protein sequencing

Restrepo-Pérez, Laura; Joo, Chirlmin; Dekker, Cees

doi:10.1038/s41565-018-0236-6

Cited by 337 publications

(342 citation statements)

References 106 publications

Supporting

Mentioning

340

Contrasting

Unclassified

Order By: Relevance

“…The advent of new protein sequencing technologies has accelerated the rate of protein discovery [1]. While protein sequence repositories are growing exponentially, existing methods for experimental characterization are not able to keep up with the present rate of novel sequence discovery [2,3].…”

Section: Introductionmentioning

confidence: 99%

Transforming the Language of Life: Transformer Neural Networks for Protein Prediction Tasks

Nambiar

Liu

Hopkins

et al. 2020

Preprint

View full text Add to dashboard Cite

The scientific community is rapidly generating protein sequence information, but only a fraction of these proteins can be experimentally characterized. While promising deep learning approaches for protein prediction tasks have emerged, they have computational limitations or are designed to solve a specific task. We present a Transformer neural network that pre-trains task-agnostic sequence representations. This model is fine-tuned to solve two different protein prediction tasks: protein family classification and protein interaction prediction. Our method is comparable to existing state-of-the art approaches for protein family classification, while being much more general than other architectures. Further, our method outperforms all other approaches for protein interaction prediction. These results offer a promising framework for fine-tuning the pre-trained sequence representations for other protein prediction tasks.to identify functional characteristics is critical to understanding cellular functions as well as developing potential therapeutic applications [4]. Sequence-based methods to computationally infer protein characteristics have been critical for inferring protein function and other characteristics [5]. Thus, the development of computational methods to infer protein characteristics (which we generally describe as "protein prediction tasks") has become paramount in the field of bioinformatics and computational biology. Here, we develop a Transformer neural network to establish task-agnostic representations of protein sequences, and use the Transformer network to solve two protein prediction tasks. Background: Deep LearningDeep learning, a class of machine learning based on the use of artificial neural networks, has recently transformed the field of computational biology and medicine through its application towards long-standing problems such as image analysis, gene expression modeling, sequence variant calling, and putative drug discovery [6,7,8,9,10]. By leveraging deep learning, field specialists have been able to efficiently design and train models without the extensive feature engineering required by previous methods. In applying deep learning to sequence-based protein characterization tasks, we first consider the field of natural language processing (NLP), which aims to analyze human language through computational techniques [11]. Deep learning has recently proven to be a critical tool for NLP, achieving state-ofthe-art performance on benchmarks for named entity recognition, sentiment analysis, question answering, and text summarization, among others [12,13].Neural networks are functions that map one vector space to another. Thus, in order to use them for NLP tasks, we first need to represent words as real-valued vectors. Often referred to as word embeddings, these vector representations are typically "pre-trained" on an auxiliary task for which we have (or can automatically generate) a large amount of training data. The goal of this pre-training is to learn generically useful representations that enc...

show abstract

Section: Introductionmentioning

confidence: 99%

Transforming the Language of Life: Transformer Neural Networks for Protein Prediction Tasks

Nambiar

Liu

Hopkins

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…Often test samples may contain only trace amounts so single molecule methods are of interest. With nanopores a single polymer molecule (DNA, RNA, protein) is electrophoretically transported through a pore in an electrolytic cell (e-cell) and the resulting current blockades probed to identify the sequence of monomers in the polymer [3,4]. A comprehensive review that charts developments in protein sequencing along four different directions is given in [5].…”

Section: Introductionmentioning

confidence: 99%

Label-free amino acid identification forde novoprotein sequencing via tRNA charging and current blockade in a nanopore

Sampath¹

2020

Preprint

View full text Add to dashboard Cite

Label-free identification of amino acids (AAs) based on superspecific transfer RNAs (tRNAs) and amino-acyl tRNA synthetase (AARS) is explored. Molecules of a specific AA, a tRNA and its cognate AARS, and adenosine triphosphate (ATP) from a reservoir are confined to a micro-sized cavity by hydraulic pressure to counter the effects of diffusion. In this confined space a cognate tRNA in the cavity gets charged with AA and adenosine monophosphate (AMP) is released. The products are transferred to an electrolytic cell with a nanopore where AA, AMP, and ATP cause current blockades of different sizes. If an AMP-sized blockade occurs it means that a tRNA molecule has been charged with AA, this identifies AA; otherwise the procedure is repeated with a different tRNA and AARS until AA is identified. Unlike in most other nanopore-based methods, there is no need for precise measurement of current blockade levels. The efficacy of the procedure is assessed by simulating the movement of particles in a reservoir-cavity structure and analyzing the blockades that occur in the nanopore. In the former the probability of reactants escaping the cavity is found to be near 0. In the latter the probability of an error in identifying an AMP blockade is found to be less than 10% in the worst case (which occurs with the largest volume AA, namely Tryptophan). Detailed information and data are provided in a Supplementary File.

show abstract

“…[1][2][3][4][5][6] One of the primary driving forces behind this research is the development of nanopores as label-free, stochastic sensors at the ultimate analytical limit (i.e., single molecule). 7-10 Such detectors have applications ranging from the analysis of biopolymers such as DNA [11][12][13][14][15][16][17][18] or proteins, [19][20][21][22][23] to the detection and quantification of biomarkers, [24][25][26][27][28][29][30] to the fundamental study of chemical or enzymatic reactions at the single molecular level. 22,[31][32][33][34] Nanopores are typically operated in the resistive-pulse mode, where the fluctuations of their ionic conductance are monitored over time.…”

Section: Introductionmentioning

confidence: 99%

Modeling of Ion and Water Transport in the Biological Nanopore ClyA

Willems

Ruić

Lucas

et al. 2020

Preprint

View full text Add to dashboard Cite

In recent years, the protein nanopore cytolysin A (ClyA) has become a valuable tool for the detection, characterization and quantification of biomarkers, proteins and nucleic acids at the single-molecule level. Despite this extensive experimental utilization, a comprehensive computational study of ion and water transport through ClyA is currently lacking. Such a study yields a wealth of information on the electrolytic conditions inside the pore and on the scale the electrophoretic forces that drive molecular transport. To this end we have built a computationally efficient continuum model of ClyA which, together with an extended version of Poison-Nernst-Planck-Navier-Stokes (ePNP-NS) equations, faithfully reproduces its ionic conductance over a wide range of salt concentrations. These ePNP-NS equations aim to tackle the shortcomings of the traditional PNP-NS models by self-consistently taking into account the influence of both the ionic strength and the nanoscopic scale of the pore on all relevant electrolyte properties. In this study, we give both a detailed description of our ePNP-NS 1 model and apply it to the ClyA nanopore. This enabled us to gain a deeper insight into the influence of ionic strength and applied voltage on the ionic conductance through ClyA and a plethora of quantities difficult to assess experimentally. The latter includes the cation and anion concentrations inside the pore, the shape of the electrostatic potential landscape and the magnitude of the electro-osmotic flow. Our work shows that continuum models of biological nanopores-if the appropriate corrections are applied-can make both qualitatively and quantitatively meaningful predictions that could be valuable tool to aid in both the design and interpretation of nanopore experiments.

show abstract

Paving the way to single-molecule protein sequencing

Cited by 337 publications

References 106 publications

Transforming the Language of Life: Transformer Neural Networks for Protein Prediction Tasks

Transforming the Language of Life: Transformer Neural Networks for Protein Prediction Tasks

Label-free amino acid identification forde novoprotein sequencing via tRNA charging and current blockade in a nanopore

Modeling of Ion and Water Transport in the Biological Nanopore ClyA

Contact Info

Product

Resources

About