During FOSDEM 2021, we presented in the same event the LUMI supercomputer and we discussed about the Open Software Platform for GPU-accelerated Computing by AMD (ROCm) ecosystem, how to port CUDA codes to Heterogeneous Interface for Portability (HIP), and some performance results based on the utilization of NVIDIA V100 GPU. In this talk we assume the audience is familiar with the content of the previous presentation. One year later, we have executed many codes on AMD MI100 GPU, tuned the performance on various codes and benchmarks, utilized and tuned a few programming models such as HIP, OpenMP offloading, Kokkos, and hipSYCL on AMD MI100 and compared their performance additionally with NVIDIA V100 and NVIDIA A100 (including CUDA). Furthermore, a new open source software is released by AMD, called GPUFort, to port Fortran+CUDA/OpenACC codes to Fortran+HIP for AMD GPUs. In this talk we present what we learned through our experience, how we tune the codes for MI100, how we expect to tune them in the future for LUMI GPU, the AMD MI250X, compare the previously mentioned programming models on some kernels across the GPUs, present a performance comparison for single precision benchmark, discuss the updated software roadmap, and a brief update for the porting workflow.
Speakers: Georgios Markomanolis