While doing the checkpoint-restore-in-userspace project for several years we've collected a number of interesting things that worth being researched and implemented. These include purely technical tasks an all levels starting from hacking the kernel and up to the CRIU itself, as well as tricky math problems. The talk is about the most interesting stuff from this list.
Checkpoint and restore is the task to get as much information as possible about running Linux processes, saving them as a set of files and recreating back the processes in the original state, so that they do not notice the change. There has been a lot of implementation of this technology, but nowadays the mainstream one is the CRIU project. During several years of work on it we've met several interesting things that could be implemented as a part or on top of the core CRIU, but that are put aside for later.
One of the interesting technical things is how to organize the logging. Logging in C/R is very critical as if checkpoint or restore fails without the most detailed traces of the process it's extremely hard to debug what's going on and what steps lead us to failure. On the other hand, generating too many log messages may slow things down significantly. How to find the balance between fast and informative logs and how to generate the messages in the fastest way possible?
An example of the kernel-level issue is -- to get information about a process CRIU does a LOT of system calls. And a huge amount of time is spent on reading various /proc files. Optimizing this thing would be a tempting kernel feature, but what's the best way? Speeding up the /proc, ability to merge severas syscalls into one, or developing yet anther subsystem that would report info about processes faster and more flexible that proc does?
As far as math problems are concerned, the best one we have is: given a graph of objects and rules for creating vertices and edges, guess (or evaluate) the sequence of rules that would generate this graph our of an empty one. In Linux language it may sound as -- how to fork(), setsid(), open() and do other syscalls to generate the process tree with resources in the state we want.
These and some other thinkers are to be presented at the talk.
Speakers: Pavel Emelyanov