(Bi-)Weekly meeting green compute team
→
Europe/London
, , , ,
Description
Live notes: see Teams channel / on demand for non-UofM.
Zoom link:
https://cern.zoom.us/j/69108649411?pwd=BhqU0RERtnPf2gtK872m4gSM6izuZx.1
In this meeting, we will discuss status of the joint work with Glasgow, and connections to the HEPScore/HEPBenchmarks work.
[obtained by feeding the Zoom transcript into Gemini 2.5 Pro + some manual editing, prompt: Please produce meeting minutes and action items from this transcript, do not add any extra information]
Meeting Summary & Goals
- The meeting was a follow-up to discuss power accounting work and interactions with HEPScore/HEPBenchmarks.
- The overall goal is to consolidate methods for measuring power on compute nodes at various sites.
- The data will eventually be made available to HEPScore/HEPBenchmarks, with the aim of Manchester becoming a new testing site.
- The agreed-upon approach for now is for each site to run its own scripts and write the power values to a file, creating a uniform interface for data collection jobs.
Data Collection & Scripts
- Emanuele has provided sample data, which is point-in-time data, as opposed to the time-series data the team has been working with. He has also added the output of his scripts to the conference page.
- The team discussed the frequency of data collection, noting some data (like which jobs are running) changes infrequently.
- The starting point for the technical work is to replicate what Emanuele is doing and determine what information can already be gathered from Prometheus.
- The group aims to unify data formats or find a common ground.
- Alessandra noted that Tier 2 sites also need this functionality to get values onto worker nodes. She suggested:
- Scripts should be shared and have options for different tools (e.g., IPMI tool, BMC Prometheus).
- A comparison should be made between BMC values and IPMI tool output on at least one machine to ensure they are consistent.
- Natalia is preparing a repository (in the benchmark repository) to collect the different power-monitoring methods currently used at sites like DESY and Glasgow. It should be available this week.
New Student Project at Manchester (Rosie Schiffmann)
- A new student with a physics background is starting next Monday for an 8-week project.
- Several potential tasks were discussed:
- Integrating a new workload into HEPSCORE: Deemed too complex and time-consuming for this project.
- Data Visualization: Discussed, but noted that tools like Grafana already exist. The value would be in exploring the data to see what can be visualized (e.g., peak vs. off-peak usage).
- GPU vs. CPU simulation: Considered an interesting and useful task.
- Decision: The student will work on restructuring and tidying up a set of existing scripts (from a previous student) for running CodeCarbon/RAPL on individual user workflows. This provides a lightweight tool for users to check their own jobs, separate from the full HEPSCORE integration.
Action Items
- Michael, Sakshi: Dive into the sample data provided by Emmanuel to identify which metrics can be replicated using existing tools like Prometheus.
- Natalia: Share the new repository for collecting power-monitoring scripts once it is available (expected this week).
- Michael, Emmanuel, Sakshi, Alessandra: Schedule a meeting for early next week to discuss progress on data replication and format unification. Plan to meet weekly.
- Caterina: Distill the meeting discussion into minutes and share them
- Natalia: May also consider a small, useful task for the new student, possibly related to Grafana or another existing tool, for once the student is done with Luis’s scripts.
- Caterina: The new student will begin by reading and summarizing the thesis from the previous student, Luis.
There are minutes attached to this event.
Show them.