The Unix dgsh shell provides an expressive way to construct sophisticated and efficient data processing pipelines using standard Unix tools, as well as third-party and custom-built components. Dgsh allows the specification of pipelines of non-uniform non-linear operations. For example tee can feed three processes whose output can then be collected by paste. The pipelines form a directed acyclic process graph, which is typically executed by multiple processor cores, thus increasing the task's processing throughput. We will see how to use dgsh in practice through a number of general data processing and domain-specific examples, and how to adapt tools for use with dgsh.
Speakers: Diomidis Spinellis