Covering points by disjoint boxes with outliers

Ahn, Hee-Kap; Bae, Sang Won; Demaine, Erik D.; Demaine, Martin L.; Kim, Sang-Sub; Korman, Matias; Reinbacher, Iris; Son, Wanbin

doi:10.1016/j.comgeo.2010.10.002

Cited by 19 publications

(6 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In particular, if one were to have access to an oracle that could find the optimal covering of a data set for any radius, our problem could be solved by finding the minimum radius that gives the desired number of covering spheres (e.g., via binary search). Unfortunately, finding the minimum-cardinality cover is NP-complete (Attali et al, 2016), and although algorithms for a variety of simplified settings have been studied (Ahn et al, 2011; Alt et al, 2006; Chan and Hu, 2015; Chvatal, 1979), none scales to the high-dimensional and large-scale data that we need to handle in single-cell genomics. Given the hardness of the covering problem, we aimed to devise an approximate covering algorithm that readily scales to large-scale single-cell data while maintaining good sketch quality.…”

Section: Methodsmentioning

confidence: 99%

Geometric Sketching Compactly Summarizes the Single-Cell Transcriptomic Landscape

Hie

Cho

DeMeo

et al. 2019

Preprint

View full text Add to dashboard Cite

Large-scale single-cell RNA-sequencing (scRNA-seq) studies that profile hundreds of thousands of cells are becoming increasingly common, overwhelming existing analysis pipelines. Here, we describe how to enhance and accelerate single-cell data analysis by summarizing the transcriptomic heterogeneity within a data set using a small subset of cells, which we refer to as a geometric sketch. Our sketches provide more comprehensive visualization of transcriptional diversity, capture rare cell types with high sensitivity, and accurately reveal biological cell types via clustering. Our sketch of umbilical cord blood cells uncovers a rare subpopulation of inflammatory macrophages, which we experimentally validated in vitro. The construction of our sketches is extremely fast, which enabled us to accelerate other crucial resource-intensive tasks such as scRNA-seq data integration. We anticipate that our algorithm will become an 42 in a matter of minutes and with an asymptotic runtime that is close to linear in the size of the data 43 set. We empirically demonstrate that our algorithm produces sketches that more evenly represent 44 the transcriptional space covered by the data. We further show that our sketches enhance and 45 5 Preprint. Work in progress. accelerate downstream analyses by preserving rare cell types, producing visualizations that 46 broadly capture transcriptomic heterogeneity, facilitating the identification of cell types via 47 131 transcriptional variability within a data set, allowing researchers to more easily gain insight into 132 rarer transcriptional states. 133 Rare Cell Types Are Better Preserved Within Geometric Sketches 134 As suggested by the above results, one of the key advantages of our algorithm is that it naturally 135 increases the representation of rare cell types with sufficient transcriptomic heterogeneity in the 136 subsampled data. Using the four data sets mentioned above, which include cell type labels 137 157 clustering algorithm (Blondel et al., 2008). Then, we transferred cluster labels to the rest of the 158 data set via k-nearest-neighbor classification and assessed the agreement between our 159 unsupervised cluster labels and the biological cell type labels provided by the original studies 160

show abstract

Section: Methodsmentioning

confidence: 99%

Geometric Sketching Compactly Summarizes the Single-Cell Transcriptomic Landscape

Hie

Cho

DeMeo

et al. 2019

Preprint

View full text Add to dashboard Cite

show abstract

“…In particular, if one were to have access to an oracle that could find the optimal covering of a dataset for any radius, our problem could be solved by finding the minimum radius that gives the desired number of covering spheres (e.g., via binary search). Unfortunately, finding the minimum-cardinality cover is NP-complete (Attali et al, 2016), and although algorithms for a variety of simplified settings have been studied (Ahn et al, 2011;Alt et al, 2006;Chan and Hu, 2015;Chvatal, 1979), none scales to the high-dimensional and large-scale data that we need to handle in single-cell genomics. Given the hardness of the covering problem, we aimed to devise an approximate covering algorithm that readily scales to large-scale single-cell data while maintaining good sketch quality.…”

Section: Theoretical Connection To Covering Problemsmentioning

confidence: 99%

Geometric Sketching Compactly Summarizes the Single-Cell Transcriptomic Landscape

Hie¹,

Cho²,

DeMeo

et al. 2019

Cell Systems

106

View full text Add to dashboard Cite

show abstract

“…This is the motivation behind the RR and DA algorithms presented in this paper. There has been significant past work on covering points with various geometrical objects [8]- [15]. Our approach draws contrast from these methods in that we seek to use lines as a satisfactory approximation of a multi-class dataset, instead of a precise covering of all points.…”

Section: A Finding Co-linear Classesmentioning

confidence: 99%

One Line To Rule Them All: Generating LO-Shot Soft-Label Prototypes

Sucholutskv

Kim

Browne

et al. 2021

2021 International Joint Conference on Neural Networks (IJCNN)

View full text Add to dashboard Cite

Increasingly large datasets are rapidly driving up the computational costs of machine learning. Prototype generation methods aim to create a small set of synthetic observations that accurately represent a training dataset but greatly reduce the computational cost of learning from it. Assigning soft labels to prototypes can allow increasingly small sets of prototypes to accurately represent the original training dataset. Although foundational work on 'less than one'-shot learning has proven the theoretical plausibility of learning with fewer than one observation per class, developing practical algorithms for generating such prototypes remains an unexplored territory. We propose a novel, modular method for generating soft-label prototypical lines that still maintains representational accuracy even when there are fewer prototypes than the number of classes in the data. In addition, we propose the Hierarchical Soft-Label Prototype k-Nearest Neighbor classification algorithm based on these prototypical lines. We show that our method maintains high classification accuracy while greatly reducing the number of prototypes required to represent a dataset, even when working with severely imbalanced and difficult data. Our code is available at https://github.com/ilia10000/SLkNN.

show abstract

Covering points by disjoint boxes with outliers

Cited by 19 publications

References 22 publications

Geometric Sketching Compactly Summarizes the Single-Cell Transcriptomic Landscape

Geometric Sketching Compactly Summarizes the Single-Cell Transcriptomic Landscape

Geometric Sketching Compactly Summarizes the Single-Cell Transcriptomic Landscape

One Line To Rule Them All: Generating LO-Shot Soft-Label Prototypes

Contact Info

Product

Resources

About