Mass spectrometry data is one of the key sources of information in many workflows in medicine and across the life sciences. Mass fragmentation spectra are generally considered to be characteristic signatures of the chemical compound they originate from, yet the chemical structure itself usually cannot be easily deduced from the spectrum. Often, spectral similarity measures are used as a proxy for structural similarity but this approach is strongly limited by a generally poor correlation between both metrics. Here, we propose MS2DeepScore: a novel Siamese neural network to predict the structural similarity between two chemical structures solely based on their MS/MS fragmentation spectra. Using a cleaned dataset of > 100,000 mass spectra of about 15,000 unique known compounds, we trained MS2DeepScore to predict structural similarity scores for spectrum pairs with high accuracy. In addition, sampling different model varieties through Monte-Carlo Dropout is used to further improve the predictions and assess the model’s prediction uncertainty. On 3600 spectra of 500 unseen compounds, MS2DeepScore is able to identify highly-reliable structural matches and to predict Tanimoto scores for pairs of molecules based on their fragment spectra with a root mean squared error of about 0.15. Furthermore, the prediction uncertainty estimate can be used to select a subset of predictions with a root mean squared error of about 0.1. Furthermore, we demonstrate that MS2DeepScore outperforms classical spectral similarity measures in retrieving chemically related compound pairs from large mass spectral datasets, thereby illustrating its potential for spectral library matching. Finally, MS2DeepScore can also be used to create chemically meaningful mass spectral embeddings that could be used to cluster large numbers of spectra. Added to the recently introduced unsupervised Spec2Vec metric, we believe that machine learning-supported mass spectral similarity measures have great potential for a range of metabolomics data processing pipelines.
Accurate and reliable measurement of the severity of dystonia is essential for the indication, evaluation, monitoring and fine-tuning of treatments. Assessment of dystonia in children and adolescents with dyskinetic cerebral palsy (CP) is now commonly performed by visual evaluation either directly in the doctor’s office or from video recordings using standardized scales. Both methods lack objectivity and require much time and effort of clinical experts. Only a snapshot of the severity of dyskinetic movements (i.e., choreoathetosis and dystonia) is captured, and they are known to fluctuate over time and can increase with fatigue, pain, stress or emotions, which likely happens in a clinical environment. The goal of this study was to investigate whether it is feasible to use home-based measurements to assess and evaluate the severity of dystonia using smartphone-coupled inertial sensors and machine learning. Video and sensor data during both active and rest situations from 12 patients were collected outside a clinical setting. Three clinicians analyzed the videos and clinically scored the dystonia of the extremities on a 0–4 scale, following the definition of amplitude of the Dyskinesia Impairment Scale. The clinical scores and the sensor data were coupled to train different machine learning models using cross-validation. The average F1 scores (0.67 ± 0.19 for lower extremities and 0.68 ± 0.14 for upper extremities) in independent test datasets indicate that it is possible to detected dystonia automatically using individually trained models. The predictions could complement standard dyskinetic CP measures by providing frequent, objective, real-world assessments that could enhance clinical care. A generalized model, trained with data from other subjects, shows lower F1 scores (0.45 for lower extremities and 0.34 for upper extremities), likely due to a lack of training data and dissimilarities between subjects. However, the generalized model is reasonably able to distinguish between high and lower scores. Future research should focus on gathering more high-quality data and study how the models perform over the whole day.
Selecting behavioral outputs in a dynamic environment is the outcome of integrating multiple information streams and weighing possible action outcomes with their value. Integration depends on the medial prefrontal cortex (mPFC), but how mPFC neurons encode information necessary for appropriate behavioral adaptation is poorly understood. To identify spiking patterns of mPFC during learned behavior, we extracellularly recorded neuronal action potential firing in the mPFC of rats performing a whisker-based “Go”/“No-go” object localization task. First, we identify three functional groups of neurons, which show different degrees of spiking modulation during task performance. One group increased spiking activity during correct “Go” behavior (positively modulated), the second group decreased spiking (negatively modulated) and one group did not change spiking. Second, the relative change in spiking was context-dependent and largest when motor output had contextual value. Third, the negatively modulated population spiked more when rats updated behavior following an error compared to trials without integration of error information. Finally, insufficient spiking in the positively modulated population predicted erroneous behavior under dynamic “No-go” conditions. Thus, mPFC neuronal populations with opposite spike modulation characteristics differentially encode context and behavioral updating and enable flexible integration of error corrections in future actions.
Microbial specialised metabolism is full of valuable natural products that are applied clinically, agriculturally, and industrially. The genes that encode their biosynthesis are often physically clustered on the genome in biosynthetic gene clusters (BGCs). Many BGCs consist of multiple groups of co-evolving genes called sub-clusters that are responsible for the biosynthesis of a specific chemical moiety in a natural product. Sub-clusters therefore provide an important link between the structures of a natural product and its BGC, which can be leveraged for predicting natural product structures from sequence, as well as for linking chemical structures and metabolomics-derived mass features to BGCs. While some initial computational methodologies have been devised for sub-cluster detection, current approaches are not scalable, have only been run on small and outdated datasets, or produce an impractically large number of possible sub-clusters to mine through. Here, we constructed a scalable method for unsupervised sub-cluster detection, called iPRESTO, based on topic modelling and statistical analysis of co-occurrence patterns of enzyme-coding protein families. iPRESTO was used to mine sub-clusters across 150,000 prokaryotic BGCs from antiSMASH-DB. After annotating a fraction of the resulting sub-cluster families, we could predict a substructure for 16% of the antiSMASH-DB BGCs. Additionally, our method was able to confirm 83% of the experimentally characterised sub-clusters in MIBiG reference BGCs. Based on iPRESTO-detected sub-clusters, we could correctly identify the BGCs for xenorhabdin and salbostatin biosynthesis (which had not yet been annotated in BGC databases), as well as propose a candidate BGC for akashin biosynthesis. Additionally, we show for a collection of 145 actinobacteria how substructures can aid in linking BGCs to molecules by correlating iPRESTO-detected sub-clusters to MS/MS-derived Mass2Motifs substructure patterns. This work paves the way for deeper functional and structural annotation of microbial BGCs by improved linking of orphan molecules to their cognate gene clusters, thus facilitating accelerated natural product discovery.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.