Abstract. Graph partitioning requires the division of a graph's vertex set into k equally sized subsets s. t. some objective function is optimized. High-quality partitions are important for many applications, whose objective functions are often N P-hard to optimize. Most state-of-the-art graph partitioning libraries use a variant of the Kernighan-Lin (KL) heuristic within a multilevel framework. While these libraries are very fast, their solutions do not always meet all user requirements. Moreover, due to its sequential nature, KL is not easy to parallelize. Its use as a load balancer in parallel numerical applications therefore requires complicated adaptations. That is why we developed previously an inherently parallel algorithm, called Bubble-FOS/C (Meyerhenke et al., IPDPS'06), which optimizes partition shapes by a diusive mechanism. However, it is too slow for practical use, despite its high solution quality.In this paper, besides proving that Bubble-FOS/C converges towards a local optimum of a potential function, we develop a much faster method for the improvement of partitionings. This faster method called TruncCons is based on a dierent diusive process, which is restricted to local areas of the graph and also contains a high degree of parallelism. By coupling TruncCons with Bubble-FOS/C in a multilevel framework based on two dierent hierarchy construction methods, we obtain our new graph partitioning heuristic DibaP. Compared to Bubble-FOS/C, DibaP shows a considerable acceleration, while retaining the positive properties of the slower algorithm. Experiments with popular benchmark graphs show that DibaP computes consistently better results than the state-of-the-art libraries METIS and JOSTLE. Moreover, with our new algorithm, we have improved the best known edge-cut values for a signicant number of partitionings of six widely used benchmark graphs.