There are mainly two types of people in the scientific computing world: those who produce data and those who consume it. Those who have models and generate data from those models, a process known as 'simulation', and those who have data and infer models from the data ('analytics'). The former often originate from disciplines such as Engineering, Physics, or Climatology, while the latter are most often active in Remote sensing, Bioinformatics, Sociology, or Management.
Simulations often require large amount of computations so they are often run on generic High-Performance Computing (HPC) infrastructures built on a cluster of powerful high-end machines linked together with high-bandwidth low-latency networks. The cluster is often augmented with hardware accelerators (co-processors such as GPUs or FPGAs) and a large and fast parallel filesystem, all setup and tuned by systems administrators. By contrast, in analytics, the focus is on the storage and access of the data so analytics is often performed on a BigData infrastructure suited for the problem at hand. Those infrastructure offer specific data stores and are often installed in a more or less self-service way on a public or private 'Cloud' typically built on top of 'commodity' hardware.
Those two worlds, the world of HPC and the world of BigData are slowly, but surely, converging. The HPC world realises that there are more to data storage than just files and that 'self-service' ideas are tempting. In the meantime, the BigData world realises that co-processors and fast networks can really speedup analytics. And indeed, all major public Cloud services now have an HPC offering. And many academic HPC centres start to offer Cloud infrastructures and BigData-related tools.
This talk will focus on the latter point of view and review the tools originating from the BigData and the ideas from the Cloud that can be implemented in a HPC context to enlarge the offer for scientific computing in universities and research centres.
Speakers: Damien François