An instance of colorful k-center consists of points in a metric space that are colored red or blue, along with an integer k and a coverage requirement for each color. The goal is to find the smallest radius ρ such that there exist balls of radius ρ around k of the points that meet the coverage requirements. The motivation behind this problem is twofold. First, from fairness considerations: each color/group should receive a similar service guarantee, and second, from the algorithmic challenges it poses: this problem combines the difficulties of clustering along with the subset-sum problem. In particular, we show that this combination results in strong integrality gap lower bounds for several natural linear programming relaxations. Our main result is an efficient approximation algorithm that overcomes these difficulties to achieve an approximation guarantee of 3, nearly matching the tight approximation guarantee of 2 for the classical k-center problem which this problem generalizes. * A preliminary version of this work was presented at the 21st Conference on Integer Programming and Combinatorial Optimization (IPCO 2020). An independent work of Anegg et al. [2], presented at the same venue, gave a 4-approximation for Colorful k-Center with constantly many colors using different techniques. This work is supported by the Swiss National Science Foundation project 200021-184656 "Randomness in Problem Instances and Randomized Algorithms."
An instance of colorfulk-center consists of points in a metric space that are colored red or blue, along with an integer k and a coverage requirement for each color. The goal is to find the smallest radius $$\rho $$ ρ such that there exist balls of radius $$\rho $$ ρ around k of the points that meet the coverage requirements. The motivation behind this problem is twofold. First, from fairness considerations: each color/group should receive a similar service guarantee, and second, from the algorithmic challenges it poses: this problem combines the difficulties of clustering along with the subset-sum problem. In particular, we show that this combination results in strong integrality gap lower bounds for several natural linear programming relaxations. Our main result is an efficient approximation algorithm that overcomes these difficulties to achieve an approximation guarantee of 3, nearly matching the tight approximation guarantee of 2 for the classical k-center problem which this problem generalizes. algorithms either opened more than k centers or only worked in the special case when the input points are in the plane.
Background Over the last decade, the rapid development of high-throughput sequencing platforms has accelerated species description and assisted morphological classification through DNA barcoding. However, the current high-throughput DNA barcoding methods cannot obtain full-length barcode sequences due to read length limitations (e.g. a maximum read length of 300 bp for the Illumina’s MiSeq system), or are hindered by a relatively high cost or low sequencing output (e.g. a maximum number of eight million reads per cell for the PacBio’s SEQUEL II system). Results Pooled cytochrome c oxidase subunit I (COI) barcodes from individual specimens were sequenced on the MGISEQ-2000 platform using the single-end 400 bp (SE400) module. We present a bioinformatic pipeline, HIFI-SE, that takes reads generated from the 5′ and 3′ ends of the COI barcode region and assembles them into full-length barcodes. HIFI-SE is written in Python and includes four function modules of filter, assign, assembly and taxonomy. We applied the HIFI-SE to a set of 845 samples (30 marine invertebrates, 815 insects) and delivered a total of 747 fully assembled COI barcodes as well as 70 Wolbachia and fungi symbionts. Compared to their corresponding Sanger sequences (72 sequences available), nearly all samples (71/72) were correctly and accurately assembled, including 46 samples that had a similarity score of 100% and 25 of ca. 99%. Conclusions The HIFI-SE pipeline represents an efficient way to produce standard full-length barcodes, while the reasonable cost and high sensitivity of our method can contribute considerably more DNA barcodes under the same budget. Our method thereby advances DNA-based species identification from diverse ecosystems and increases the number of relevant applications.
In the Non-Uniform k-Center problem we need to cover a finite metric space using k balls of different radii that can be scaled uniformly. The goal is to minimize the scaling factor. If the number of different radii is unbounded, the problem does not admit a constant-factor approximation algorithm but it has been conjectured that such an algorithm exists if the number of radii is constant. Yet, this is known only for the case of two radii. Our first contribution is a simple black box reduction which shows that if one can handle the variant of t − 1 radii with outliers, then one can also handle t radii. Together with an algorithm by Chakrabarty and Negahbani for two radii with outliers, this immediately implies a constant-factor approximation algorithm for three radii; thus making further progress on the conjecture. Furthermore, using algorithms for the k-center with outliers problem, that is the one radii with outliers case, we also get a simple algorithm for two radii.The algorithm by Chakrabarty and Negahbani uses a top-down approach, starting with the larger radius and then proceeding to the smaller one. Our reduction, on the other hand, looks only at the smallest radius and eliminates it, which suggests that a bottom-up approach is promising. In this spirit, we devise a modification of the Chakrabarty and Negahbani algorithm which runs in a bottom-up fashion, and in this way we recover their result with the advantage of having a simpler analysis.
We study the problem of explainable clustering in the setting first formalized by Moshkovitz, Dasgupta, Rashtchian, and Frost (ICML 2020). A k-clustering is said to be explainable if it is given by a decision tree where each internal node splits data points with a threshold cut in a single dimension (feature), and each of the k leaves corresponds to a cluster. We give an algorithm that outputs an explainable clustering that loses at most a factor of O(log 2 k) compared to an optimal (not necessarily explainable) clustering for the k-medians objective, and a factor of O(k log 2 k) for the k-means objective. This improves over the previous best upper bounds of O(k) and O(k 2 ), respectively, and nearly matches the previous Ω(log k) lower bound for k-medians and our new Ω(k) lower bound for k-means. The algorithm is remarkably simple. In particular, given an initial not necessarily explainable clustering in R d , it is oblivious to the data points and runs in time O(dk log 2 k), independent of the number of data points n. Our upper and lower bounds also generalize to objectives given by higher p -norms.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.