Please note that this talk has been cancelled as the speaker is no longer able to attend FOSDEM.
See http://staff.bath.ac.uk/masjhd/MattColes.html for more background on the project.
Systems with large amounts of multi-level cache available require the ability
to profile cache performance at each layer, as there remains a significant
performance disparity between L2 and L3 caches. An example of such a system
and the focus of this work is the new Cavium ThunderX2 supercomputer at the
GW4 Isambard project. This paper intends to detail the methodology used to
extend the capabilities of the Valgrind tool: Cachegrind. This extension will
enable it to model both the L2 and L3 caches separately, as opposed to as a
single last level cache. This support will be added for both the ARMv8
architecture, in order to model performance on the Isambard supercomputer, and
in addition x86_64, for comparative purposes. As a result of this extension,
the Isambard project will be able to optimise frequently used programs, in
order to reduce CPU time used and therefore improve throughput on the system.