Collagen is one of the most important structural proteins
in biology,
and its structural hierarchy plays a crucial role in many mechanically
important biomaterials. Here, we demonstrate how transformer models
can be used to predict, directly from the primary amino acid sequence,
the thermal stability of collagen triple helices, measured via the
melting temperature T
m. We report two
distinct transformer architectures to compare performance. First,
we train a small transformer model from scratch, using our collagen
data set featuring only 633 sequence-to-T
m pairings. Second, we use a large pretrained transformer model, ProtBERT,
and fine-tune it for a particular downstream task by utilizing sequence-to-T
m pairings, using a deep convolutional network
to translate natural language processing BERT embeddings into required
features. Both the small transformer model and the fine-tuned ProtBERT
model have similar R
2 values of test data
(R
2 = 0.84 vs 0.79, respectively), but
the ProtBERT is a much larger pretrained model that may not always
be applicable for other biological or biomaterials questions. Specifically,
we show that the small transformer model requires only 0.026% of the
number of parameters compared to the much larger model but reaches
almost the same accuracy for the test set. We compare the performance
of both models against 71 newly published sequences for which T
m has been obtained as a validation set and
find reasonable agreement, with ProtBERT outperforming the small transformer
model. The results presented here are, to our best knowledge, the
first demonstration of the use of transformer models for relatively
small data sets and for the prediction of specific biophysical properties
of interest. We anticipate that the work presented here serves as
a starting point for transformer models to be applied to other biophysical
problems.
Incorporating dynamic metal-coordination bonds as cross-links into synthetic materials has become attractive not only to improve self-healing and toughness, but also due to the tunability of metal-coordination bonds. However, a priori determination of bond lifetime of metal-coordination complexes, especially important in the rational design of metal-coordinated materials with prescribed properties, is missing. We report an empirical relationship between the energy landscape of metal-coordination bonds, simulated via metadynamics, and the resulting macroscopic relaxation time in ideal metal-coordinated hydrogels. Importantly, we expand the Arrhenius relationship between the macroscopic hydrogel relaxation time and metal-coordinate bond activation energy to include width and landscape ruggedness identified in the simulated energy landscapes. Using biologically relevant Ni
2+
-nitrogen coordination complexes as a model case, we demonstrate that the quantitative relationship developed from histidine-Ni
2+
and imidazole-Ni
2+
complexes can predict the average relaxation times of other Ni
2+
-nitrogen coordinated networks. We anticipate the quantitative relationship presented here to be a starting point for the development of more sophisticated models that can predict relaxation timescales of materials with programmable viscoelastic properties.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.