Koki Tsuyuzaki scite author profile

Background: Principal component analysis (PCA) is an essential method for analyzing single-cell RNA-seq (scRNA-seq) datasets, but for large-scale scRNA-seq datasets, computation time is long and consumes large amounts of memory. Results: In this work, we review the existing fast and memory-efficient PCA algorithms and implementations and evaluate their practical application to large-scale scRNA-seq datasets. Our benchmark shows that some PCA algorithms based on Krylov subspace and randomized singular value decomposition are fast, memory-efficient, and more accurate than the other algorithms. Conclusion: We develop a guideline to select an appropriate PCA implementation based on the differences in the computational environment of users and developers.

show abstract

Benchmarking principal component analysis for large-scale single-cell RNA-sequencing

Tsuyuzaki

Sato

et al. 2019

Preprint

View full text Add to dashboard Cite

Principal component analysis (PCA) is an essential method for analyzing single-cell RNA-seq (scRNA-seq) datasets, but large-scale scRNA-seq datasets require long computational times and a large memory capacity.In this work, we review 21 fast and memory-efficient PCA implementations (10 algorithms) and evaluate their application using 4 real and 18 synthetic datasets. Our benchmarking showed that some PCA algorithms are faster, more memory efficient, and more accurate than others. In consideration of the differences in the computational environments of users and developers, we have also developed guidelines to assist with selection of appropriate PCA implementations.

show abstract

MeSH ORA framework: R/Bioconductor packages to support MeSH over-representation analysis

Tsuyuzaki

Morota

Ishii³

et al. 2015

BMC Bioinformatics

View full text Add to dashboard Cite

BackgroundIn genome-wide studies, over-representation analysis (ORA) against a set of genes is an essential step for biological interpretation. Many gene annotation resources and software platforms for ORA have been proposed. Recently, Medical Subject Headings (MeSH) terms, which are annotations of PubMed documents, have been used for ORA. MeSH enables the extraction of broader meaning from the gene lists and is expected to become an exhaustive annotation resource for ORA. However, the existing MeSH ORA software platforms are still not sufficient for several reasons.ResultsIn this work, we developed an original MeSH ORA framework composed of six types of R packages, including MeSH.db, MeSH.AOR.db, MeSH.PCR.db, the org.MeSH.XXX.db-type packages, MeSHDbi, and meshr.ConclusionsUsing our framework, users can easily conduct MeSH ORA. By utilizing the enriched MeSH terms, related PubMed documents can be retrieved and saved on local machines within this framework.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-015-0453-z) contains supplementary material, which is available to authorized users.

show abstract

Uncovering hypergraphs of cell-cell interaction from single cell RNA-sequencing data

Tsuyuzaki

Ishii

Nikaido

2019

Preprint

View full text Add to dashboard Cite

Complex biological systems can be described as a multitude of cell-cell interactions (CCIs). Recent single-cell RNA-sequencing technologies have enabled the detection of CCIs and related ligand-receptor (L-R) gene expression simultaneously. However, previous data analysis methods have focused on only one-to-one CCIs between two cell types. To also detect many-to-many CCIs, we propose scTensor, a novel method for extracting representative triadic relationships (hypergraphs), which include (i) ligand-expression, (ii) receptor-expression, and (iii) L-R pairs. When applied to simulated and empirical datasets, scTensor was able to detect some hypergraphs including paracrine/autocrine CCI patterns, which cannot be detected by previous methods. 1 2 Background 3 Complex biological systems such as tissue homeostasis [1, 2], neurotransmission [3, 4 4], immune response [5], ontogenesis [6], and stem cells niche [7, 8] are composed by 5 cell-cell interaction (CCI). Many molecular biology studies have been decomposed 6 the system into constituent parts (e.g., genes, proteins, and metabolites) to clarify 7 is implicitly hypothesized the CCI as a one-to-one relationship. Therefore, in the 1 case II dataset, many-to-many CCIs such as the CCIs corresponding to green L-R 2 sets, are hard to detect by the method. This is because for each L-R pair, mean 3 values for any combination of cell types are basically high in such situations, and a 4 P -value corresponding to a one-to-one CCI tends to be large (i.e., not significant); 5 accordingly, the observed L-R coexpression and the null distribution calculated 6 are hard to distinguish. In the analysis of real datasets presented later, however, 7 the L-R gene expression pairs are not always the cell-type specific, and it is more 8 natural that the CCI corresponding to the L-R has a many-to-many relationship. 9This simulation shows that scTensor is a more general method for detecting CCIs 10 and their related L-R pairs at once, irrespective of whether a particular CCI is 11 one-to-one or many-to-many.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Koki Tsuyuzaki

Benchmarking principal component analysis for large-scale single-cell RNA-sequencing

Benchmarking principal component analysis for large-scale single-cell RNA-sequencing

MeSH ORA framework: R/Bioconductor packages to support MeSH over-representation analysis

Uncovering hypergraphs of cell-cell interaction from single cell RNA-sequencing data

Contact Info

Product

Resources

About