Graph processing systems are important in the big data domain. However, processing graphs in parallel often introduces redundant computations in existing algorithms and models. Prior work has proposed techniques to optimize redundancies for out-of-core graph systems, rather than distributed graph systems. In this paper, we study various state-of-the-art distributed graph systems and observe root causes for these pervasively existing redundancies. To reduce redundancies without sacrificing parallelism, we further propose
SLFE,
a distributed graph processing system, designed with the principle of "start late or finish early".
SLFE
employs a novel preprocessing stage to obtain a graph's topological knowledge with negligible overhead.
SLFE's
redundancy-aware vertex-centric computation model can then utilize such knowledge to reduce the redundant computations at runtime.
SLFE
also provides a set of APIs to improve programmability. Our experiments on an 8-machine high-performance cluster show that
SLFE
outperforms all well-known distributed graph processing systems with the inputs of real-world graphs, yielding up to 75x speedup. Moreover,
SLFE
outperforms two state-of-the-art shared memory graph systems on a high-end machine with up to 1644x speedup.
SLFE's
redundancy-reduction schemes are generally applicable to other vertex-centric graph processing systems.