2020
DOI: 10.1101/2020.04.08.031948
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SweetOrigins: Extracting Evolutionary Information from Glycans

Abstract: Glycans, the most diverse biopolymer and crucial for many biological processes, are shaped by evolutionary pressures stemming in particular from host-pathogen interactions. While this positions glycans as being essential for understanding and targeting host-pathogen interactions, their considerable diversity and a lack of methods has hitherto stymied progress in leveraging their predictive potential. Here, we utilize a curated dataset of 12,674 glycans from 1,726 species to develop and apply machine learning m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
17
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2
2
1

Relationship

2
3

Authors

Journals

citations
Cited by 9 publications
(17 citation statements)
references
References 44 publications
(37 reference statements)
0
17
0
Order By: Relevance
“…The nonlinear branching structure of glycans, together with their diversity, has hitherto presented an obstacle to the development of machine learning models for glycobiology that fully capitalized on the rich information in glycan sequences. The use of a glycoword-based language model overcame some of these limitations, allowing for the prediction of glycan immunogenicity, pathogenicity, or taxonomic class (Bojar et al, 2020a(Bojar et al, , 2020b(Bojar et al, , 2021; data augmentation inspired by graph isomorphism further improved predictions (Bojar et al, 2021). This led us to consider whether the structure of glycans as graphs or trees could be better captured by neural network architectures specifically developed for modeling graphs.…”
Section: Developing a Gcnn For Glycansmentioning
confidence: 99%
See 1 more Smart Citation
“…The nonlinear branching structure of glycans, together with their diversity, has hitherto presented an obstacle to the development of machine learning models for glycobiology that fully capitalized on the rich information in glycan sequences. The use of a glycoword-based language model overcame some of these limitations, allowing for the prediction of glycan immunogenicity, pathogenicity, or taxonomic class (Bojar et al, 2020a(Bojar et al, , 2020b(Bojar et al, , 2021; data augmentation inspired by graph isomorphism further improved predictions (Bojar et al, 2021). This led us to consider whether the structure of glycans as graphs or trees could be better captured by neural network architectures specifically developed for modeling graphs.…”
Section: Developing a Gcnn For Glycansmentioning
confidence: 99%
“…Computational approaches to analyzing glycans are mostly limited to counting the occurrence of curated sequence motifs and using this information as input for models predicting glycan properties (Bao et al, 2019;Coff et al, 2020). Recently, deep learning has been applied to the analysis of glycan sequences, creating glycan language models based on recurrent neural networks (Bojar et al, 2020a(Bojar et al, , 2020b(Bojar et al, , 2021. The glycan language model SweetTalk views glycans as a sequence of ''glycowords'' (subsequences that describe structural contexts of a glycan) and was used to predict the taxonomic class of glycans as well as their properties, such as immunogenicity or contribution to pathogenicity.…”
Section: Introductionmentioning
confidence: 99%
“…The ideal way to compare different systems is to run each on the same input and obtain their respective predictions, then compare their outputs with scores such as AUC, R-squares and mean square errors. Here we compare GlyNet with CCARL by Coff et al 30 and SweetTalk by Bojar et al [31][32][33][34] .…”
Section: Comparison With Other Machine Learning Modelsmentioning
confidence: 99%
“…Previous machine learning (ML) approaches to protein-glycan interaction employed techniques like support vector machines 17,18 , graph kernels 18 , modularity optimization methods 19 , and Markov models [20][21][22] , to identify glycan motifs, substructures that specific proteins recognize-for reviews see Mamitsuka 23 , Haab 24 , and Sese 25 . Several publications from Aoki-Kinoshita and co-workers 20,22,[26][27][28][29] , Coff and co-workers 30 , Cummings and co-workers 14 , and recently Bojar and co-workers [31][32][33][34] focus on ML classification models that predict qualitative features (think "strong vs. weak" interactions) for each glycan-protein pair. Woods and co-workers combined molecular mechanics, automated 3D glycan structure generation and docking techniques to produce computational carbohydrate grafting that can qualitatively predict binding between a carbohydrate fragment and a known 3D protein structure 35 .…”
mentioning
confidence: 99%
See 1 more Smart Citation