We present the pure functional array language, Futhark, along with its optimising GPU-targeting compiler. Of particular focus are the language tradeoffs necessary to ensure the ability to efficiently generate high-performance GPU code from a high-level parallel language. We also demonstrate (nested) data-parallel array programming, a programming paradigm that enables concise programming of massively parallel systems. We show how Futhark code can be easily integrated with larger applications written in other language. Finally, we report benchmarks showing that Futhark is able to match the performance of hand-written code on various published benchmarks.
GPUs and other massively parallel systems are now common, yet
programming them is often a painful experience. Languages are often
low-level and fragile, with careful hand-optimisation necessary to
obtain good performance. The programmer is often forced to write
highly coupled code with little modularity. The high-level languages
that exist, often functional in nature, are often insufficiently
flexible, or poor performes in practice. We present our work on a
programming language that seeks a common ground between imperative and
functional approaches.
Futhark is a small programming language designed to be compiled to
efficient GPU code. It is a statically typed, data-parallel, and
purely functional array language, and comes with a heavily optimising
ahead-of-time compiler that generates GPU code via OpenCL. Futhark is
not designed for graphics programming, but instead uses the compute
power of the GPU to accelerate data-parallel array computations. We
support regular nested data-parallelism, as well as a form of
imperative-style in-place modification of arrays, while still
preserving the purity of the language via the use of a uniqueness type
system.
The Futhark language and compiler is an ongoing research project. It
can compile nontrivial programs which then run on real GPUs at high
speed. The Futhark compiler employs a set of optimisations (fusion,
flattening, distribution, tiling, etc) to shield the programmer from
having to know the details of the underlying hardware. The Futhark
language itself is still very spartan - due to the basic design
criteria requiring the ability to generate high-performance GPU code,
it takes more effort to support language features that are common in
languages with more forgiving compilation targets. Nevertheless,
Futhark can already be used for nontrivial programs, and has been used
to port several real-world benchmark applications, with performance
comparable to original hand-written GPU (OpenCL or CUDA) code.
Futhark is not intended to replace existing general-purpose languages.
Our intended use case is that Futhark is only used for relatively
small but compute-intensive parts of an application. The Futhark
compiler generates code that can be easily integrated with non-Futhark
code. For example, you can compile a Futhark program to a Python
module that internally uses PyOpenCL to execute code on the GPU, yet
looks like an ordinary Python module from the outside.