Michael Svendsen scite author profile

We consider cluster-based network servers in which a front-end directs incoming requests to one of a number of back-ends. Speci cally, w e consider content-based request distribution: the front-end uses the content r equested, in addition to information about the load on the back-end nodes, to choose which b a c k-end will handle this request. Content-based request distribution can improve locality in the back-ends' main memory caches, increase secondary storage scalability b y partitioning the server's database, and provide the ability to employ back-end nodes that are specialized for certain types of requests.As a speci c policy for content-based request distribution, we i n troduce a simple, practical strategy for locality-aware request distribution (LARD). With LARD, the front-end distributes incoming requests in a manner that achieves high locality in the back-ends' main memory caches as well as load balancing. Locality is increased by dynamically subdividing the server's working set over the back-ends. Trace-based simulation results and measurements on a prototype implementation demonstrate substantial performance improvements over state-of-the-art approaches that use only load information to distribute requests. On workloads with working sets that do not t in a single server node's main memory cache, the achieved throughput exceeds that of the state-of-the-art approach b y a factor of two to four.With content-based distribution, incoming requests must be handed o to a back-end in a manner transparent to the client, after the front-end has inspected the content of the request. To this end, we i n troduce an e cient TCP hando protocol that can hand o an established TCP connection in a client-transparent manner.To appear in the Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), San Jose, CA, Oct 1998. IntroductionNetwork servers based on clusters of commodity w orkstations or PCs connected by high-speed LANs combine cutting-edge performance and low cost. A cluster-based network server consists of a front-end, responsible for request distribution, and a number of back-end nodes, responsible for request processing. The use of a front-end makes the distributed nature of the server transparent to the clients. In most current cluster servers the frontend distributes requests to back-end nodes without regard to the type of service or the content requested. That is, all back-end nodes are considered equally capable of serving a given request and the only factor guiding the request distribution is the current load of the backend nodes.With content-based r equest distribution, the frontend takes into account both the service/content r equested and the current load on the back-end nodes when deciding which back-end node should serve a given request. The potential advantages of content-based request distribution are: (1) increased performance due to improved hit rates in the back-end's main memory caches, (2) increased secon...

show abstract

Mining maximal cliques from a large graph using MapReduce: Tackling highly uneven subproblem sizes

Svendsen

¹

,

Mukherjee

²

,

Tirthapura

³

2015

Journal of Parallel and Distributed Computing

View full text Add to dashboard Cite

We consider Maximal Clique Enumeration (MCE) from a large graph. A maximal clique is perhaps the most fundamental dense substructure in a graph, and MCE is an important tool to discover densely connected subgraphs, with numerous applications to data mining on web graphs, social networks, and biological networks. While effective sequential methods for MCE are known, scalable parallel methods for MCE are still lacking.We present a new parallel algorithm for MCE, Parallel Enumeration of Cliques using Ordering (PECO" role="presentation" style="box-sizing: border-box; display: inline-block; line-height: normal; font-size: 14.4px; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; maxwidth: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;">PECO), designed for the MapReduce framework. Unlike previous works, which required a post-processing step to remove duplicate and non-maximal cliques, PECO" role="presentation" style="boxsizing: border-box; display: inline-block; line-height: normal; font-size: 14.4px; word-spacing: normal; wordwrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; minwidth: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;">PECOenumerates only maximal cliques with no duplicates. The key technical ingredient is a total ordering of the vertices of the graph which is used in a novel way to achieve a load balanced distribution of work, and to eliminate redundant work among processors. We implemented PECO" role="presentation" style="box-sizing: border-box; display: inline-block; line-height: normal; font-size: 14.4px; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;">PECO on Hadoop MapReduce, and our experiments on a cluster show that the algorithm can effectively process a variety of large real-world graphs with millions of vertices and tens of millions of maximal cliques, and scales well with the degree of available parallelism. KeywordsGraph mining, Maximal clique enumeration, Enumeration algorithm, MapReduce, Hadoop, Parallel algorithm, Clique, Load balancing Disciplines Electrical and Computer EngineeringComments This is a manuscript of an article from Svendsen, Michael, Arko Provo Mukherjee, and Srikanta Tirthapura. "Mining maximal cliques from a large graph using mapreduce: Tackling highly uneven subproblem sizes. h i g h l i g h t s• Scalable method for enumerating maximal cliques in a graph using MapReduce.• Effective solution to load balancing.• Experimental evaluation of our solution on large real world graphs.• Outperforms previous MapReduce solutions by orders of magnitude. a r t i c l e i n f o b s t r a c tWe consider Maximal Clique Enumeration (MCE) from a large graph. A maximal clique is perhaps the most fundamental dense substru...

show abstract

Locality-aware request distribution in cluster-based network servers

Pai

¹

,

Aron

²

,

Banga

³

et al. 1998

View full text Add to dashboard Cite

We consider cluster-based network servers in which a front-end directs incoming requests to one of a number of back-ends. Speci cally, w e consider content-based request distribution: the front-end uses the content r equested, in addition to information about the load on the back-end nodes, to choose which b a c k-end will handle this request. Content-based request distribution can improve locality in the back-ends' main memory caches, increase secondary storage scalability b y partitioning the server's database, and provide the ability to employ back-end nodes that are specialized for certain types of requests.As a speci c policy for content-based request distribution, we i n troduce a simple, practical strategy for locality-aware request distribution (LARD). With LARD, the front-end distributes incoming requests in a manner that achieves high locality in the back-ends' main memory caches as well as load balancing. Locality is increased by dynamically subdividing the server's working set over the back-ends. Trace-based simulation results and measurements on a prototype implementation demonstrate substantial performance improvements over state-of-the-art approaches that use only load information to distribute requests. On workloads with working sets that do not t in a single server node's main memory cache, the achieved throughput exceeds that of the state-of-the-art approach b y a factor of two to four.With content-based distribution, incoming requests must be handed o to a back-end in a manner transparent to the client, after the front-end has inspected the content of the request. To this end, we i n troduce an e cient TCP hando protocol that can hand o an established TCP connection in a client-transparent manner.To appear in the Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII), San Jose, CA, Oct 1998. IntroductionNetwork servers based on clusters of commodity w orkstations or PCs connected by high-speed LANs combine cutting-edge performance and low cost. A cluster-based network server consists of a front-end, responsible for request distribution, and a number of back-end nodes, responsible for request processing. The use of a front-end makes the distributed nature of the server transparent to the clients. In most current cluster servers the frontend distributes requests to back-end nodes without regard to the type of service or the content requested. That is, all back-end nodes are considered equally capable of serving a given request and the only factor guiding the request distribution is the current load of the backend nodes.With content-based r equest distribution, the frontend takes into account both the service/content r equested and the current load on the back-end nodes when deciding which back-end node should serve a given request. The potential advantages of content-based request distribution are: (1) increased performance due to improved hit rates in the back-end's main memory caches, (2) increased secon...

show abstract

Incremental maintenance of maximal cliques in a dynamic graph

Das

¹

,

Svendsen

²

,

Tirthapura

³

2019

View full text Add to dashboard Cite

We consider the maintenance of the set of all maximal cliques in a dynamic graph that is changing through the addition or deletion of edges. We present nearly tight bounds on the magnitude of change in the set of maximal cliques, as well as the first change-sensitive algorithms for clique maintenance, whose runtime is proportional to the magnitude of the change in the set of maximal cliques. We present experimental results showing these algorithms are efficient in practice, and are faster than prior work by two to three orders of magnitude.

show abstract