Serverless computing facilitates the use of resources without the burden of administering and maintaining infrastructure. The simplification of IaaS appears ideal (in theory) but providers and users are presented with several challenges: providers aim to reduce infrastructure maintenance overheads; users require isolation, flexibility and programming freedom.
Serverless deployments are mostly backed by sandboxed containers. To enable programming freedom for users, providers allow the use of containers for function deployment, however, to ensure strict isolation, these containers are sandboxed in VMs. As a result, this bloated stack brings complicated maintenance costs: (a) several layers of abstraction between the user function to be executed and the actual execution environment; (b) increased attack surface; (c) increased request-to-exec time; (d) reduced set of feature availability for functions (hardware acceleration).
Unikernels promise fast boot times, small memory footprint and stronger security but lack in terms of manageability. Additionally, Serverless frameworks only support containers. Moreover, unikernels provide a different environment for applications, with limited or no support for widely used libraries and OS features. This issue is even more apparent in the case of ML/AI workloads. ML/AI libraries are often dynamically linked and have numerous dependencies, which directly contradict the statically linked notion of unikernels. Finally, hardware acceleration is almost non-existent in unikernel frameworks, mainly due to the absence of suitable virtualization solutions for such devices.
In this talk, we present the design of a flexible serverless framework designed for the cloud and the edge, backed by unikernels that can access hardware accelerators. We go through the components that comprise the framework and elaborate on the challenges in building such a software stack: we first present an overview of the necessary components of a serverless framework; then we focus on the function execution framework based on two popular unikernel frameworks; finally, we present a hardware acceleration abstraction to expose semantic acceleration functionality to workloads running on top of this framework.
A short demo of the working components will be presented, discussing the challenges and trade-offs of this approach.