Babak Saremi scite author profile

Babak Saremi

5Publications

8Citation Statements Received

121Citation Statements Given

How they've been cited

How they cite others

174

121

Affiliations

Fraunhofer Institute for Toxicology and Experimental Medicine, University of Veterinary Medicine Hannover, Foundation

Publications

Order By: Most citations

Measuring reproducibility of virus metagenomics analyses using bootstrap samples from FASTQ-files

Saremi

Kohls

Liebig

et al. 2020

View full text Add to dashboard Cite

Motivation High-throughput sequencing data can be affected by different technical errors, e.g. from probe preparation or false base calling. As a consequence, reproducibility of experiments can be weakened. In virus metagenomics, technical errors can result in falsely identified viruses in samples from infected hosts. We present a new resampling approach based on bootstrap sampling of sequencing reads from FASTQ-files in order to generate artificial replicates of sequencing runs which can help to judge the robustness of an analysis. In addition, we evaluate a mixture model on the distribution of read counts per virus to identify potentially false positive findings. Results The evaluation of our approach on an artificially generated data set with known viral sequence content shows in general a high reproducibility of uncovering viruses in sequencing data. I.e., the correlation between original and mean bootstrap read count was highly correlated. However, the bootstrap read counts can also indicate reduced or increased evidence for the presence of a virus in the biological sample. We also found that the mixture model fits well to the read counts, and furthermore, it provides a higher accuracy on the original or on the bootstrap read counts than on the difference between both. The usefulness of our methods is further demonstrated on two freely available real world data sets from harbour seals. Availability We provide a Phyton tool, called RESEQ, available from https://github.com/babaksaremi/RESEQ that allows efficient generation of bootstrap reads from an original FASTQ-file. Contact klaus.jung@tiho-hannover.de Supplementary information Supplementary data are available at Bioinformatics online.

show abstract

Augmentation of Transcriptomic Data for Improved Classification of Patients with Respiratory Diseases of Viral Origin

Kircher¹,

Chludzinski²,

Krepel³

et al. 2022

IJMS

View full text Add to dashboard Cite

To better understand the molecular basis of respiratory diseases of viral origin, high-throughput gene-expression data are frequently taken by means of DNA microarray or RNA-seq technology. Such data can also be useful to classify infected individuals by molecular signatures in the form of machine-learning models with genes as predictor variables. Early diagnosis of patients by molecular signatures could also contribute to better treatments. An approach that has rarely been considered for machine-learning models in the context of transcriptomics is data augmentation. For other data types it has been shown that augmentation can improve classification accuracy and prevent overfitting. Here, we compare three strategies for data augmentation of DNA microarray and RNA-seq data from two selected studies on respiratory diseases of viral origin. The first study involves samples of patients with either viral or bacterial origin of the respiratory disease, the second study involves patients with either SARS-CoV-2 or another respiratory virus as disease origin. Specifically, we reanalyze these public datasets to study whether patient classification by transcriptomic signatures can be improved when adding artificial data for training of the machine-learning models. Our comparison reveals that augmentation of transcriptomic data can improve the classification accuracy and that fewer genes are necessary as explanatory variables in the final models. We also report genes from our signatures that overlap with signatures presented in the original publications of our example data. Due to strict selection criteria, the molecular role of these genes in the context of respiratory infectious diseases is underlined.

show abstract

A resampling strategy for studying robustness in virus detection pipelines

Kohls

Saremi

Muchsin

et al. 2021

Computational Biology and Chemistry

View full text Add to dashboard Cite

A comparison of strategies for generating artificial replicates in RNA-seq experiments

Saremi

Gusmag

Distl

et al. 2022

Sci Rep

View full text Add to dashboard Cite

Due to the overall high costs, technical replicates are usually omitted in RNA-seq experiments, but several methods exist to generate them artificially. Bootstrapping reads from FASTQ-files has recently been used in the context of other NGS analyses and can be used to generate artificial technical replicates. Bootstrapping samples from the columns of the expression matrix has already been used for DNA microarray data and generates a new artificial replicate of the whole experiment. Mixing data of individual samples has been used for data augmentation in machine learning. The aim of this comparison is to evaluate which of these strategies are best suited to study the reproducibility of differential expression and gene-set enrichment analysis in an RNA-seq experiment. To study the approaches under controlled conditions, we performed a new RNA-seq experiment on gene expression changes upon virus infection compared to untreated control samples. In order to compare the approaches for artificial replicates, each of the samples was sequenced twice, i.e. as true technical replicates, and differential expression analysis and GO term enrichment analysis was conducted separately for the two resulting data sets. Although we observed a high correlation between the results from the two replicates, there are still many genes and GO terms that would be selected from one replicate but not from the other. Cluster analyses showed that artificial replicates generated by bootstrapping reads produce it p values and fold changes that are close to those obtained from the true data sets. Results generated from artificial replicates with the approaches of column bootstrap or mixing observations were less similar to the results from the true replicates. Furthermore, the overlap of results among replicates generated by column bootstrap or mixing observations was much stronger than among the true replicates. Artificial technical replicates generated by bootstrapping sequencing reads from FASTQ-files are better suited to study the reproducibility of results from differential expression and GO term enrichment analysis in RNA-seq experiments than column bootstrap or mixing observations. However, FASTQ-bootstrapping is computationally more expensive than the other two approaches. The FASTQ-bootstrapping may be applicable to other applications of high-throughput sequencing.

show abstract

Genealyzer: web application for the analysis and comparison of gene expression data

2023

View full text Add to dashboard Cite

Background Gene expression profiling is a widely adopted method in areas like drug development or functional gene analysis. Microarray data of gene expression experiments is still commonly used and widely available for retrospective analyses. However, due to to changes of the underlying technologies data sets from different technologies are often difficult to compare and thus a multitude of already available data becomes difficult to use. We present a web application that abstracts away mathematical and programmatical details in order to enable a convenient and customizable analysis of microarray data for large-scale reproducibility studies. In addition, the web application provides a feature that allows easy access to large microarray repositories. Results Our web application consists of three basic steps which are necessary for a differential gene expression analysis as well as Gene Ontology (GO) enrichment analysis and the comparison of multiple analysis results. Genealyzer can handle Affymetrix data as well as one-channel and two-channel Agilent data. All steps are visualized with meaningful plots. The application offers flexible analysis while being intuitively operable. Conclusions Our web application provides a unified platform for analysing microarray data, while allowing users to compare the results of different technologies and organisms. Beyond reproducibility, this also offers many possibilities for gaining further insights from existing study data, especially since data from different technologies or organisms can also be compared. The web application can be accessed via this URL: https://genealyzer.item.fraunhofer.de/. Login credentials can be found at the end.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Babak Saremi

Measuring reproducibility of virus metagenomics analyses using bootstrap samples from FASTQ-files

Augmentation of Transcriptomic Data for Improved Classification of Patients with Respiratory Diseases of Viral Origin

A resampling strategy for studying robustness in virus detection pipelines

A comparison of strategies for generating artificial replicates in RNA-seq experiments

Genealyzer: web application for the analysis and comparison of gene expression data

Contact Info

Product

Resources

About