The k-means method is an old but popular clustering algorithm known for its observed speed and its simplicity. Until recently, however, no meaningful theoretical bounds were known on its running time. In this paper, we demonstrate that the worst-case running time of k-means is superpolynomial by improving the best known lower bound from Ω(n) iterations to 2 Ω( √ n) .
We study the use of viral marketing strategies on social networks to maximize revenue from the sale of a single product. We propose a model in which the decision of a buyer to buy the product is influenced by friends that own the product and the price at which the product is offered. The influence model we analyze is quite general, naturally extending both the Linear Threshold model and the Independent Cascade model, while also incorporating price information. We consider sales proceeding in a cascading manner through the network, i.e. a buyer is offered the product via recommendations from its neighbors who own the product. In this setting, the seller influences events by offering a cashback to recommenders and by setting prices (via coupons or discounts) for each buyer in the social network.Finding a seller strategy which maximizes the expected revenue in this setting turns out to be NPhard. However, we propose a seller strategy that generates revenue guaranteed to be within a constant factor of the optimal strategy in a wide variety of models. The strategy is based on an influence-andexploit idea, and it consists of finding the right trade-off at each time step between: generating revenue from the current user versus offering the product for free and using the influence generated from this sale later in the process. We also show how local search can be used to improve the performance of this technique in practice.
We show a worst-case lower bound and a smoothed upper bound on the number of iterations performed by the Iterative Closest Point (ICP) algorithm. First proposed by Besl and McKay, the algorithm is widely used in computational geometry where it is known for its simplicity and its observed speed. The theoretical study of ICP was initiated by Ezra, Sharir and Efrat, who bounded its worst-case running time between Ω(n log n) and O(n 2 d) d . We substantially tighten this gap by improving the lower bound to Ω(n/d) d+1 . To help reconcile this bound with the algorithm's observed speed, we also show the smoothed complexity of ICP is polynomial, independent of the dimensionality of the data. Using similar methods, we improve the best known smoothed upper bound for the popular k-means method to n O(k) , once again independent of the dimension.
The k-means method is one of the most widely used clustering algorithms, drawing its popularity from its speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to close the gap between practical performance and theoretical analysis, the k-means method has been studied in the model of smoothed analysis. But even the smoothed analyses so far are unsatisfactory as the bounds are still super-polynomial in the number n of data points.In this paper, we settle the smoothed running time of the k-means method. We show that the smoothed number of iterations is bounded by a polynomial in n and 1/σ, where σ is the standard deviation of the Gaussian perturbations. This means that if an arbitrary input data set is randomly perturbed, then the k-means method will run in expected polynomial time on that input set. * Supported by a fellowship within the Postdoc-Program of the German Academic Exchange Service (DAAD).
The k -means method is one of the most widely used clustering algorithms, drawing its popularity from its speed in practice. Recently, however, it was shown to have exponential worst-case running time. In order to close the gap between practical performance and theoretical analysis, the k -means method has been studied in the model of smoothed analysis. But even the smoothed analyses so far are unsatisfactory as the bounds are still super-polynomial in the number n of data points. In this article, we settle the smoothed running time of the k -means method. We show that the smoothed number of iterations is bounded by a polynomial in n and 1/ σ , where σ is the standard deviation of the Gaussian perturbations. This means that if an arbitrary input data set is randomly perturbed, then the k -means method will run in expected polynomial time on that input set.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.