Hierarchical clustering is a recursive partitioning of a dataset into clusters at an increasingly finer granularity. Motivated by the fact that most work on hierarchical clustering was based on providing algorithms, rather than optimizing a specific objective, Dasgupta (2016) framed similarity-based hierarchical clustering as a combinatorial optimization problem, where a 'good' hierarchical clustering is one that minimizes some cost function. He showed that this cost function has certain desirable properties, such as in order to achieve optimal cost disconnected components must be separated first and that in 'structureless' graphs, i.e., cliques, all clusterings achieve the same cost.We take an axiomatic approach to defining 'good' objective functions for both similarity and dissimilarity-based hierarchical clustering. We characterize a set of admissible objective functions (that includes the one introduced by Dasgupta) that have the property that when the input admits a 'natural' ground-truth hierarchical clustering, the ground-truth clustering has an optimal value.Equipped with a suitable objective function, we analyze the performance of practical algorithms, as well as develop better and faster algorithms for hierarchical clustering. For similaritybased hierarchical clustering, Dasgupta (2016) showed that a simple recursive sparsest-cut based approach achieves an Oplog 3{2 nq-approximation on worst-case inputs. We give a more refined analysis of the algorithm and show that it in fact achieves an Op ? log nq-approximation 1 . This improves upon the LP-based Oplog nq-approximation of Roy and Pokutta (2016). For dissimilarity-based hierarchical clustering, we show that the classic average-linkage algorithm gives a factor 2 approximation, and provide a simple and better algorithm that gives a factor 3{2 approximation. This aims at explaining the success of this heuristics in practice. Finally, we consider 'beyond-worst-case' scenario through a generalisation of the stochastic block model for hierarchical clustering. We show that Dasgupta's cost function also has desirable properties for these inputs and we provide a simple algorithm that for graphs generated according to this model yields a 1 + o(1) factor approximation.
We present three semi-streaming algorithms for Maximum Bipartite Matching with one and two passes. Our one-pass semi-streaming algorithm is deterministic and returns a matching of size at least 1/2 + 0.005 times the optimal matching size in expectation, assuming that edges arrive one by one in (uniform) random order. Our first two-pass algorithm is randomized and returns a matching of size at least 1/2 + 0.019 times the optimal matching size in expectation (over its internal random coin flips) for any arrival order. These two algorithms apply the simple Greedy matching algorithm several times on carefully chosen subgraphs as a subroutine. Furthermore, we present a two-pass deterministic algorithm for any arrival order returning a matching of size at least 1/2 + 0.019 times the optimal matching size. This algorithm is built on ideas from the computation of semi-matchings.
How should players bid in keyword auctions such as those used by Google, Yahoo! and MSN? We consider greedy bidding strategies for a repeated auction on a single keyword, where in each round, each player chooses some optimal bid for the next round, assuming that the other players merely repeat their previous bid. We study the revenue, convergence and robustness properties of such strategies. Most interesting among these is a strategy we call the balanced bidding strategy (bb): it is known that bb has a unique fixed point with payments identical to those of the VCG mechanism. We show that if all players use the bb strategy and update each round, bb converges when the number of slots is at most 2, but does not always converge for 3 or more slots. On the other hand, we present a simple variant which is guaranteed to converge to the same fixed point for any number of slots. In a model in which only one randomly chosen player updates each round according to the bb strategy, we prove that convergence occurs with probability 1. We complement our theoretical results with empirical studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.