2022
DOI: 10.3389/fbinf.2022.828703
|View full text |Cite
|
Sign up to set email alerts
|

GMEmbeddings: An R Package to Apply Embedding Techniques to Microbiome Data

Abstract: Large-scale microbiome studies investigating disease-inducing microbial roles base their findings on differences between microbial count data in contrasting environments (e.g., stool samples between cases and controls). These microbiome survey studies are often impeded by small sample sizes and database bias. Combining data from multiple survey studies often results in obvious batch effects, even when DNA preparation and sequencing methods are identical. Relatedly, predictive models trained on one microbial DN… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 41 publications
(52 reference statements)
0
3
0
Order By: Relevance
“…Once the matrix of embeddings is created, the new data are simply multiplied by the embedding matrix to produce a new table of embedded data. GMEmbeddings [ 145 ] provides embeddings based on GloVe [ 146 ], an NLP algorithm, by aligning requested samples to known amplicon sequence variants (ASVs) using blast . This same GloVe algorithm can generate an embedding of a user-uploaded abundance matrix [ 147 ].…”
Section: Resultsmentioning
confidence: 99%
“…Once the matrix of embeddings is created, the new data are simply multiplied by the embedding matrix to produce a new table of embedded data. GMEmbeddings [ 145 ] provides embeddings based on GloVe [ 146 ], an NLP algorithm, by aligning requested samples to known amplicon sequence variants (ASVs) using blast . This same GloVe algorithm can generate an embedding of a user-uploaded abundance matrix [ 147 ].…”
Section: Resultsmentioning
confidence: 99%
“…The latter one produces a very smaller amount of sequences. This is why this data source is used by methods that focus on speed, such as [51] and [52], or methods that aim to create fixed pre-trained embeddings, like [53]. However, today, most methods for metagenomic analysis rely on Next-Generation Sequencing.…”
Section: Resultsmentioning
confidence: 99%
“…The goal of this approach is to improve the accuracy of disease prediction compared to using a single deep learning model. GMEmbeddings ( [53]) provides embeddings based on GloVe ( [127]), a Natural Language Processing algorithm, by aligning requested samples to known ASV using BLAST. ( [128]) uses the same GloVe algorithm to generate an embedding of a user-uploaded abundance matrix.…”
Section: Data Augmentationmentioning
confidence: 99%
See 1 more Smart Citation