Memcheck's performance is significantly limited by the need for the JIT generated code to call a helper function for every memory access. This talk will describe ongoing work to inline the fast cases of these helpers into the generated code, with the goal of getting a significant speedup.
Valgrind's Memcheck tool performs address checking at the byte level and definedness checking at the bit level. Most of the definedness checking code is generated by Valgrind's JIT (VEX) as in-line code. But address checking is done by calling small C helper functions. That means one function call for each memory access to be checked, which is expensive. And it's wasteful because those checks are actually very simple, so the overhead of the calls is significant.
This talk describes ongoing work to inline the fast paths of such helper functions into the generated code. The fast paths deal with the common case -- naturally aligned addresses and fully defined values -- with all other cases being pushed "off-trace" onto cold paths, possibly with helper calls to C land.
Modifying the until-now simple JIT to support the arbitrary control flow this requires would be a big and complex task. Instead we use a system of machine code templates which allow precise control of branching and register use without significantly complicating the JIT. Making the templates architecture-neutral yet capable of generating good straight-line code is an interesting challenge.
Speakers: Julian Seward