llvm.mix

FOSDEM 2019

Automatic program specialization is a well-formed area of computer-science research with many interesting practical applications, but to this day most existing specializers and partial evaluators are only applicable to one of a few high level programming languages. The reason is that developing a specializer for a new language remains still a from-scratch endeavor, and it's hard.

It is the lack of general-purpose flexible program specialization tools that often leads projects to creating custom just-in-time compilers for their specific use cases. These compilers, even if based on mature compiler infrastructures such as LLVM's, immediately become way harder to develop and maintain than simple interpreters they make obsolete. In many cases, however, program specialization could bring the proverbial 80% of the benefits for a fraction of the cost, while maintaining simplicity and testability of the original design.

It is our belief that developing a specializer for a new language should be as easy as adding some supporting syntactic and semantic definitions to a language front-end and reusing an existing specializer in the middle end.

Such a language-independent specializer preferably has to:

remove as much interpretation overhead as possible, and add no extra interpretation overhead of its own;
be able to produce both interpreters and compilers from the same code base, to enable gradual transition and to preserve debugging and testing properties of the original source code;
include a binding-time analysis component to simplify binding-time improvements of the source program;
be guided by annotations embedded in the source program as opposed to external annotations, to ease development and maintenance;
support multiple compilation stages to take advantage of as many specialization points as necessary;
be flexible enough with resource management to fit in both managed and unmanaged environments.

We will present the design and the prototype implementation of a multi-stage offline specializer generator that ticks most of these boxes. The generator is based on LLVM and runs in compile time along with the compilation of a source program. It is controlled by intrinsics and function and parameter attributes in LLVM IR. Being developed in the middle-end of the LLVM optimizer, the specializer generator can be used with any language front-end. We will talk about some elements of its design, its limitations, and ways to improve.

Speakers: Eugene Sharygin