2021
DOI: 10.1371/journal.pcbi.1008724
|View full text |Cite
|
Sign up to set email alerts
|

Spec2Vec: Improved mass spectral similarity scoring through learning of structural relationships

Abstract: Spectral similarity is used as a proxy for structural similarity in many tandem mass spectrometry (MS/MS) based metabolomics analyses such as library matching and molecular networking. Although weaknesses in the relationship between spectral similarity scores and the true structural similarities have been described, little development of alternative scores has been undertaken. Here, we introduce Spec2Vec, a novel spectral similarity score inspired by a natural language processing algorithm—Word2Vec. Spec2Vec l… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
125
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
1
1
1

Relationship

2
7

Authors

Journals

citations
Cited by 126 publications
(141 citation statements)
references
References 36 publications
(57 reference statements)
3
125
0
Order By: Relevance
“…Future work will seek to optimize and tune the model itself. Current deep learning methods that predict information from the mass spectra use binned spectra and process them with multilayer perceptrons 20,31 , word2vec algorithm inspired by natural language processing 32 or a transformer architecture 57 . A binned spectrum has an obvious drawback of reducing resolution of the original spectrum, losing sensitivity provided by the latest generation of mass spectrometers.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Future work will seek to optimize and tune the model itself. Current deep learning methods that predict information from the mass spectra use binned spectra and process them with multilayer perceptrons 20,31 , word2vec algorithm inspired by natural language processing 32 or a transformer architecture 57 . A binned spectrum has an obvious drawback of reducing resolution of the original spectrum, losing sensitivity provided by the latest generation of mass spectrometers.…”
Section: Discussionmentioning
confidence: 99%
“…While demonstrating solid predictive power, the methods still rely on structural fingerprints for the molecule numerical representation, only allowing the deep networks to learn from the predefined molecule substructures. Huber et al 32 suggests an adaption of the word2vec algorithm 1 https://mona.fiehnlab.ucdavis.edu/ , which was developed for natural language processing, for mass spectra representation, allowing for more efficient search in the spectral space with no use of information on the molecules structures. To fully utilise the advantages of deep learning approach, we aimed to combine the efficient numerical representations of both spectra and molecules.…”
Section: Introductionmentioning
confidence: 99%
“…128 Such situations are difficult to resolve and may signpost the border area of where spectral-based analyses are useful. However, with the increase of publicly available data and the development of novel tools such as the new mass spectral similarity measure Spec2Vec as well as alternative networking-based approaches, 45,134,135 it could well be possible to group together structurally similar metabolites (according to historical reasons and/or biosynthetic routes) taking into account that mass spectral features that are not exactly similar could still be related to each other. Moreover, with an increased number of annotated datasets, supervised machine learning approaches could further improve on the current performance of Spec2Vec.…”
Section: Structural Diversity and The Limitations Of Spectrumbased Analysismentioning
confidence: 99%
“…For example, in metabolomics analysis, mass spectral similarity metrics play a pivotal role across many tasks, including library matching and analogue searching. Our group applied ML to this task for the first time, resulting in the unsupervised Spec2Vec algorithm ( 16 ), which showed increased performance in library matching and analogue searching through the learning of relationships between mass features in many MS/MS spectra. Furthermore, we recently proposed the supervised MS2DeepScore algorithm ( 17 ), which was trained to learn molecular structural similarities based on MS/MS spectral pairs, resulting in an even better overall performance.…”
Section: Commentarymentioning
confidence: 99%