Profiling is one of the most effective enablers of compiler
optimizations. Reports of speedups of 20-30% on top of highly optimized
codes are common in the industry. Despite these gains, profiling is
rarely used during development — rather, it is used during the
generation of production-ready executables. Collecting profile
information takes time, and reusing it across different versions of the
same software leads to poor results, particularly for binary-level
optimizations: small changes to the code might invalidate large portions
of otherwise good data. This paper proposes a methodology to mitigate
this problem. When mapping profile information from one program to a
newer version of it, we use branch features, instead of addresses or
hash of instructions, as anchor points between programs. Branch features
have been used for decades as a way to predict the outcome of branches:
they are characteristics of the branch, such as direction and opcode. By
choosing good feature sets, it is possible to minimize collisions
between branches, so that different branches seldom share the same
features. We have modified the BOLT binary optimizer to use branch
features to map profile information across programs. By optimizing three
new versions of four large executables — Clang, GCC, MySQL, and
PostgreSQL — we show that the new approach to reuse profile data
yields speedups of about 8.00% over Clang -O3. In contrast, BOLT’s
default mapping, which uses relative addresses as anchor points, yields
speedups of 6.06%. Previous work based on hashing of basic blocks
yields speedups of 5.02%.