Sorting is one of the most common algorithms used in programming, and virtually every standard library contains a routine for it. Despite also being one of the oldest problems out there, surprisingly large improvements are still being found. Some of these are fundamental novelties, and others are optimizations matching the changing performance landscape in modern hardware.
In this talk we present Glidesort, a general purpose in-memory stable comparison sort. It is fully adaptive to both pre-sorted runs in the data similar to Timsort, and low-cardinality inputs similar to Pattern-defeating Quicksort, making it to our knowledge the first practical stable sorting algorithm fully adaptive in both measures. Glidesort achieves a 3x speedup over a Rustβs standard library Timsort routine on sorting random 32-bit integers, with the speedup breaking the order of magnitude barrier for realistic low-cardinality distributions. It achieves this without the use of SIMD, processor-specific intrinsics or assumptions about the type being sorted: it is a fully generic sort taking an arbitrary comparison operator.
Using Glidesort as the motivating example we discuss the principles of efficient stable in-memory partitioning and merging on modern hardware. In particular attention is paid to eliminating branches and interleaving independent parallel loops to efficiently use our modern deeply-pipelined superscalar processors. The lessons learned here are widely applicable to efficient data processing outside of sorting.
We also go into more detail on the various intricacies that Glidesort has to overcome in our Rust implementation. In particular how to deal with splitting and concatenating slices in a safe manner using GhostCell-style branding, the problems faced when dealing with panics in the comparison operator, and potential issues with interior mutability in comparison operators.