SpatialData: an open and universal data framework for spatial omics

Marconato, Luca; Palla, Giovanni; Yamauchi, Kevin A.; Virshup, Isaac; Heidari, Elyas; Treis, Tim; Toth, Marcella; Shrestha, Ramesh; Vöhringer, Harald; Huber, Wolfgang; Gerstung, Moritz; Moore, Josh; Theis, Fabian J.; Stegle, Oliver

doi:10.1101/2023.05.05.539647

Cited by 14 publications

(11 citation statements)

References 37 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Finally, as spatial omics datasets continue to increase in size, in the future, we anticipate spatial omics datasets may need to be stored in standardized data infrastructure with lazy representation of larger-than-memory data such as the Zarr file format used in the SpatialData Python library (Marconato et al, 2023). SEraster can potentially be integrated with such data infrastructure to enable rasterization of larger-than-memory data in spatially-indexed chunks rather than loading the entire dataset into memory.…”

Section: Discussionmentioning

confidence: 99%

SEraster: a rasterization preprocessing framework for scalable spatial omics data analysis

Aihara,

Clifton,

Chen

et al. 2024

Preprint

View full text Add to dashboard Cite

MotivationSpatial omics data demand computational analysis but many analysis tools have computational resource requirements that increase with the number of cells analyzed. This presents scalability challenges as researchers use spatial omics technologies to profile millions of cells.ResultsTo enhance the scalability of spatial omics data analysis, we developed a rasterization preprocessing framework called SEraster that aggregates cellular information into spatial pixels. We apply SEraster to both real and simulated spatial omics data prior to spatial variable gene expression analysis to demonstrate that such preprocessing can reduce resource requirements while maintaining high performance. We further integrate SEraster with existing analysis tools to characterize cell-type spatial cooccurrence. Finally, we apply SEraster to enable analysis of a mouse pup spatial omics dataset with over a million cells to identify tissue-level and cell-type-specific spatially variable genes as well as cooccurring cell-types that recapitulate expected organ structures.Availability and implementationSource code is available on GitHub (https://github.com/JEFworks-Lab/SEraster) with additional tutorials athttps://JEF.works/SEraster.

show abstract

Section: Discussionmentioning

confidence: 99%

SEraster: a rasterization preprocessing framework for scalable spatial omics data analysis

Aihara,

Clifton,

Chen

et al. 2024

Preprint

View full text Add to dashboard Cite

show abstract

“…Similarly, it underlies our seamless integration with co-registration methods such that multiple spatial technologies can be jointly queried or analyzed together. In this manner, our approach provides more flexibility than the recently developed SpatialData 58 package, which enforces a standard data framework. Notably, to accommodate the increasing size of spatial multi-modal datasets, we developed GiottoDB , which provides the groundwork that developers and users can use to represent their data through different backends that can scale according to their needs.…”

Section: Discussionmentioning

confidence: 99%

Giotto Suite: a multi-scale and technology-agnostic spatial multi-omics analysis ecosystem

Chen,

Chávez-Fuentes,

O’Brien

et al. 2023

Preprint

View full text Add to dashboard Cite

Emerging spatial omics technologies continue to advance the molecular mapping of tissue architecture and the investigation of gene regulation and cellular crosstalk, which in turn provide new mechanistic insights into a wide range of biological processes and diseases. Such technologies provide an increasingly large amount of information content at multiple spatial scales. However, representing and harmonizing diverse spatial datasets efficiently, including combining multiple modalities or spatial scales in a scalable and flexible manner, remains a substantial challenge. Here, we present Giotto Suite, a suite of open-source software packages that underlies a fully modular and integrated spatial data analysis toolbox. At its core, Giotto Suite is centered around an innovative and technology-agnostic data framework embedded in the R software environment, which allows the representation and integration of virtually any type of spatial omics data at any spatial resolution. In addition, Giotto Suite provides both scalable and extensible end-to-end solutions for data analysis, integration, and visualization. Giotto Suite integrates molecular, morphology, spatial, and annotated feature information to create a responsive and flexible workflow for multi-scale, multi-omic data analyses, as demonstrated here by applications to several state-of-the-art spatial technologies. Furthermore, Giotto Suite builds upon interoperable interfaces and data structures that bridge the established fields of genomics and spatial data science, thereby enabling independent developers to create custom-engineered pipelines. As such, Giotto Suite creates an immersive ecosystem for spatial multi-omic data analysis.

show abstract

“…To establish versatile tools, a common strategy involves adopting a shared data structure that seamlessly integrates across diverse technologies. SpatialData 29 serves as one such comprehensive framework, including readers tailored for the most widely used spatial-omics technologies. Building upon this, Sopa converts any data into a SpatialData object, on which all of the six following tasks are performed.…”

Section: Technology-invariant Pipelinementioning

confidence: 99%

“…This also facilitates geometry-related operations, such as cell expansion, area/perimeter computations, and cell-cell intersections. Combined with the image lazy loading offered by SpatialData 29 and Xarray 34 , we implement a fast channel averaging on cell boundaries by combining geometry operations and image chunk lazy loading (see Figure 2d), i.e., deferring loading until needed for processing. Additionally, using memory-efficient tools like Dask 31 , we extend geometric operations of GeoPandas 32 on chunks of transcripts, ensuring parallel processing of as many chunks as possible without exceeding memory limits (see Figure 2e).…”

Section: Memory Efficiency Of Sopamentioning

confidence: 99%

See 1 more Smart Citation

Sopa: a technology-invariant pipeline for analyses of image-based spatial-omics

Blampey,

Mulder,

Dutertre

et al. 2023

Preprint

View full text Add to dashboard Cite

Spatial-omics data allow in-depth analysis of tissue architectures, opening new opportunities for biological discovery. In particular, imaging techniques offer single-cell resolutions, providing essential insights into cellular organizations and dynamics. Yet, the complexity of such data presents analytical challenges and demands substantial computing resources. Moreover, the proliferation of diverse spatial-omics technologies, such as Xenium, MERSCOPE, CosMX in spatial-transcriptomics, and MACSima and PhenoCycler in multiplex imaging, hinders the generality of existing tools. We introduce Sopa (https://github.com/gustaveroussy/sopa), a technology-invariant, memory-efficient pipeline with a unified visualizer for all image-based spatial omics. Built upon the universal SpatialData framework, Sopa optimizes tasks like segmentation, transcript/channel aggregation, annotation, and geometric/spatial analysis. Its output includes user-friendly web reports and visualizer files, as well as comprehensive data files for in-depth analysis. Overall, Sopa represents a significant step toward unifying spatial data analysis, enabling a more comprehensive understanding of cellular interactions and tissue organization in biological systems.

show abstract

SpatialData: an open and universal data framework for spatial omics

Cited by 14 publications

References 37 publications

SEraster: a rasterization preprocessing framework for scalable spatial omics data analysis

SEraster: a rasterization preprocessing framework for scalable spatial omics data analysis

Giotto Suite: a multi-scale and technology-agnostic spatial multi-omics analysis ecosystem

Sopa: a technology-invariant pipeline for analyses of image-based spatial-omics

Contact Info

Product

Resources

About