The SHAREing Accelerated Compute Hub aims to share knowledge and skills relating to running, programming, and utilising shared accelerated compute platforms.
A key gap that we have identified in the knowledge space for system administration is understanding how to monitor utilisation, efficiency, and performance of software on accelerated compute. While there is a plethora of tools that can be run as part of a user job to profile its efficiency, and many tools that can be run to assess the energy consumption of a cluster as a whole and long-term utilisation of individual accelerators, there is not a clear set of best practices on how to holistically connect systems-level data with user jobs, such that users can monitor and be alerted to their resource utilisation without needing to explicitly profile each job.
This workshop brings together experts from hardware and software vendors and HPC centres to share experiences and best practices on how to connect hardware data to user workloads.
Confirmed speakers include:
- Jorda Polo, AMD
- Jan Eitzinger and Christoph Kluge, NHR@FAU (Cluster Cockpit)
- Mark Dixon, Durham University
- Mahendra Paipuri, CNRS (CEEMS)
- Rudy Shand, Linaro
We welcome both in-person and remote participation in this event. The Zoom link to participate remotely will be sent to registered participants in advance of the event.