Automatic program specialization is a well-formed area of computer-science
research with many interesting practical applications, but to this day most
existing specializers and partial evaluators are only applicable to one of a
few high level programming languages. The reason is that developing a
specializer for a new language remains still a from-scratch endeavor, and it's
hard.
It is the lack of general-purpose flexible program specialization tools that
often leads projects to creating custom just-in-time compilers for their
specific use cases. These compilers, even if based on mature compiler
infrastructures such as LLVM's, immediately become way harder to develop and
maintain than simple interpreters they make obsolete. In many cases, however,
program specialization could bring the proverbial 80% of the benefits for a
fraction of the cost, while maintaining simplicity and testability of the
original design.
It is our belief that developing a specializer for a new language should be as
easy as adding some supporting syntactic and semantic definitions to a
language front-end and reusing an existing specializer in the middle end.
Such a language-independent specializer preferably has to:
- remove as much interpretation overhead as possible, and add no extra
interpretation overhead of its own;
- be able to produce both interpreters and compilers from the same code base,
to enable gradual transition and to preserve debugging and testing
properties of the original source code;
- include a binding-time analysis component to simplify binding-time
improvements of the source program;
- be guided by annotations embedded in the source program as opposed to
external annotations, to ease development and maintenance;
- support multiple compilation stages to take advantage of as many
specialization points as necessary;
- be flexible enough with resource management to fit in both managed and
unmanaged environments.
We will present the design and the prototype implementation of a
multi-stage offline specializer generator that ticks most of these boxes. The
generator is based on LLVM and runs in compile time along with the compilation
of a source program. It is controlled by intrinsics and function and parameter
attributes in LLVM IR. Being developed in the middle-end of the LLVM
optimizer, the specializer generator can be used with any language front-end.
We will talk about some elements of its design, its limitations, and ways to
improve.