Result diversification has many important applications in databases, operations research, information retrieval, and finance. In this paper, we study and extend a particular version of result diversification, known as max-sum diversification. More specifically, we consider the setting where we are given a set of elements in a metric space and a set valuation function f defined on every subset. For any given subset S, the overall objective is a linear combination of f (S) and the sum of the distances induced by S. The goal is to find a subset S satisfying some constraints that maximizes the overall objective.This problem is first studied by Gollapudi and Sharma in [17] for modular set functions and for sets satisfying a cardinality constraint (uniform matroids). In their paper, they give a 2-approximation algorithm by reducing to an earlier result in [20]. The first part of this paper considers an extension of the modular case to the monotone submodular case, for which the algorithm in [17] no longer applies. Interestingly, we are able to maintain the same 2-approximation using a natural, but different greedy algorithm. We then further extend the problem by considering any matroid constraint and show that a natural single swap local search algorithm provides a 2-approximation in this more general setting. This extends the Nemhauser, Wolsey and Fisher approximation result [29] for the problem of submodular function maximization subject to a matroid constraint (without the distance function component).The second part of the paper focuses on dynamic updates for the modular case. Suppose we have a good initial approx- * imate solution and then there is a single weight-perturbation either on the valuation of an element or on the distance between two elements. Given that users expect some stability in the results they see, we ask how easy is it to maintain a good approximation without significantly changing the initial set. We measure this by the number of updates, where each update is a swap of a single element in the current solution with a single element outside the current solution. We show that we can maintain an approximation ratio of 3 by just a single update if the perturbation is not too large.
In practice, almost all dynamic systems require decisions to be made on-line, without full knowledge of their future impact on the system. A general model for the processing of sequences of tasks is introduced, and a general on-line decision algorithm is developed. It is shown that, for an important class of special cases, this algorithm is optimal among all on-line algorithms. Specifically, a task system ( S,d ) for processing sequences of tasks consists of a set S of states and a cost matrix d where d ( i, j is the cost of changing from state i to state j (we assume that d satisfies the triangle inequality and all diagonal entries are 0). The cost of processing a given task depends on the state of the system. A schedule for a sequence T 1 , T 2 ,…, T k of tasks is a sequence s 1 , s 2 ,…, s k of states where s i is the state in which T i is processed; the cost of a schedule is the sum of all task processing costs and the state transition costs incurred. An on-line scheduling algorithm is one that chooses s i only knowing T 1 T 2 … T i . Such an algorithm is w -competitive if, on any input task sequence, its cost is within an additive constant of w times the optimal offline schedule cost. The competitive ratio w ( S , d ) is the infimum w for which there is a w -competitive on-line scheduling algorithm for ( S , d ). It is shown that w ( S , d ) = 2|S|–1 for every task system in which d is symmetric, and w ( S, d ) = O (| S | 2 ) for every task system. Finally, randomized on-line scheduling algorithms are introduced. It is shown that for the uniform task system (in which d ( i,j ) = 1 for all i,j ), the expected competitive ratio w¯ ( S,d ) = O (log|S|).
Abstract. In practice, almost all dynamic systems require decisions to be made on-line, without full knowledge of their future impact on the system. A general model for the processing of sequences of tasks is introduced, and a general on-line decnion algorithm is developed. It is shown that, for an important class of special cases, this algorithm is optimal among all on-line algorithms.Specifically, a task system (S. d) for processing sequences of tasks consists of a set S of states and a cost matrix d where d(i, j) is the cost of changing from state i to state j (we assume that d satisfies the triangle inequality and all diagonal entries are f)). The cost of processing a given task depends on the state of the system. A schedule for a sequence T1, T2,..., Tk of tasks is a 'equence sl,s~, ..., Sk of states where s~i s the state in which T' is processed; the cost of a schedule is the sum of all task processing costs and state transition costs incurred.An on-line scheduling algorithm is one that chooses s, only knowing T1 Tz~.. T'. Such an algorithm is w-competitive if, on any input task sequence, its cost is within an additive constant of w times the optimal offline schedule cost. The competitive ratio w(S, d) is the infimum w for which there is a w-competitive on-line scheduling algorithm for (S, d). It is shown that w(S, d) = 2 ISI -1 for eoery task system in which d is symmetric, and w(S, d) = 0(1 S]2 ) for every task system. Finally, randomized on-line scheduling algorithms are introduced. It is shown that for the uniform task system (in which d(i, j) = 1 for all i, j), the expected competitive ratio w(S, d) = O(log 1s1).
The explosive growth and the widespread accessibility of the Web has led to a surge of research activity in the area of information retrieval on the World Wide Web. The seminal papers of Kleinberg [1998, 1999] and Brin and Page [1998] introduced Link Analysis Ranking, where hyperlink structures are used to determine the relative authority of a Web page and produce improved algorithms for the ranking of Web search results. In this article we work within the hubs and authorities framework defined by Kleinberg and we propose new families of algorithms. Two of the algorithms we propose use a Bayesian approach, as opposed to the usual algebraic and graph theoretic approaches. We also introduce a theoretical framework for the study of Link Analysis Ranking algorithms. The framework allows for the definition of specific properties of Link Analysis Ranking algorithms, as well as for comparing different algorithms. We study the properties of the algorithms that we define, and we provide an axiomatic characterization of the INDEGREE heuristic which ranks each node according to the number of incoming links. We conclude the article with an extensive experimental evaluation. We study the quality of the algorithms, and we examine how different structures in the graphs affect their performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.