(Bi-)Weekly meeting green compute team
Live notes: see Teams channel / on demand for non-UofM.
Zoom link:
https://cern.zoom.us/j/64447396002?pwd=M5MzzOTnDcdNDuNDPFY7slbvvhCMnt.1
In this meeting, we will discuss status of the joint work with Glasgow, and connections to the HEPScore/HEPBenchmarks work.
# UofM/Glasgow/CERN QMUL Green Compute Meeting
## 30/7/2025 1pm BST
Present: Michael, Rosie, Emanuele, Robert, Sakshi, Alessandra, Sudha
## Agenda
13:00 BST / 14:00 CERN as usual.
1. Progress on Prometheus data processing (Saksh, Myself)
* Introduction by Sudha
2. Progress on on Glasgow Prometheus replication (Emanuele)
3. Brief discussion of ROCrates and standardised reporting esp relating to runs (Michael)
4. Who is around, and when / next meeting
5. AOB
## Progress on Prometheus data processing (Sakshi, Myself)
### Discussion:
My notes:
* Created a notebook. Performed analysis on a week's data.
* Been able to do analysis to an extent, which is a start
and gained some ideas about what sorts of things would be
useful to correlate power usage against.
* Would like guidance on what to analyse, and how.
Robert/Michael:
* Robert Frank - list of different metrics
* Michael - It's a collection of search terms.
Sakshi - how to progress ?
Robert - need to be careful to select various pieces and may require summarisation.
Specific Actions:
* Sakshi - to select a collection of metrics
* Michael - to collect them for sakshi
* Both - discussion of specific steps
### Summary (by GPT from Transcript)
Sakshi:
* Generated a notebook in the metrics exporter directory for initial data analysis (approx. one week of data).
* Analysed ~800 CSV files from Manchester Prometheus; only one unlabeled metric was present.
* Received an additional Prometheus file from Robert (1,000 lines with multiple metrics, some sparse).
* Next step: review the list of metrics to identify which should be extracted for further analysis.
Robert & Michael:
* Robert clarified that the file Sakshi analysed was a list of available metrics (snapshot, not full time-series).
* Michael to pull out selected metrics from Prometheus once Sakshi provides the list.
* Post-processing will be required (e.g. summing CPU core metrics).
* Key next step: Sakshi to select useful metrics → Michael to extract them → follow-up discussion.
## Introduction by Sudha
- Hello/introductions.
Sudha introduced herself:
* New to Queen Mary University of London and GridPP (joined June).
* Experimental particle physicist with background in ATLAS and CMS trigger software.
* Will work on sustainability studies.
## Progress on on Glasgow Prometheus replication (Emanuele)
Key points:
* Looking good, been fighting with replication + filtering, so compromising on some points.
* Will re-enable this afternoon.
* Replicating from private instance to shareable instance
* Difficulty in replicating the content between the two instances is made trickier due to wanting to filter and anonymise the data.
- Decision around hostname/etc
- Might be good for us to anonymise that when we extract
- Variety of metrics added relative to Condor Jobs
GPT Summary:
Emanuele:
* Significant challenges replicating and sanitising the Glasgow Prometheus database (~300 GB).
* Initially attempted to anonymise hostnames but Prometheus made this impractical.
* Decision: retain full hostnames (privacy concern deemed minimal).
* Will wipe and re-import the database, enabling external access again.
* Additional metrics to be exposed: RAM totals, CPU totals, etc. for better usage calculations.
Note:
* Timeline: expected completion by tonight or tomorrow.
* Will be away for 2–3 weeks starting next week (limited email availability).
## Who is around, and when
Away for longer periods in August:
* Rosie: Last day mid-August
* Emanuele: Away: w/c 4th, 11th, 18th August
* Michael: Away: w/c 11th, 18th, 25th August
* Caterina: Away: w/c 18th, 25th Aug, 1st Sept
* Alessandra: Away 9–13 August and 21 Aug–4 Sept.
Generally around:
* Sudha: Mostly available in August (occasional long weekends).
* Sakshi: Available in August; possible time off in September.
* Robert: Generally available; cannot guarantee.
## Brief discussion of ROCrates and standardised reporting esp relating to runs (Michael)
* Context: Desire for reproducible and standardised sharing of Prometheus-derived datasets.
* Rosie: Has been scripting Prometheus data plots; aiming to replicate previous student’s Monte Carlo analysis.
* Proposal:
* Use RO Crates (lightweight, machine-readable JSON-LD descriptions) for:
* Capturing metrics datasets with metadata (anonymised where necessary).
* Supporting reproducibility and easier sharing within the group and potentially externally.
* Links to green metadata work we did with Loic Lannelogue at CW25 (carbon usage reporting).
* Motivation:
- Captured and shared once with Luis - manual approach is fine
- Sharing and explaining second time (With Rosie) - would be useful to capture better
- If we're likely to want to share again more times - some light-touch automation friendly changes would be useful to make it easier for the next person to pick up what's been captured in a simpler fashion. (Whether it's another MPhys student or beyond)
* Next step: Michael to create an example RO Crate for group feedback (decide which information to include/anonymise).
## date/time of next meeting
* 13th August 1pm BST, 2pm CERN
* Likely to be around: Emanuele, Caterina, Sudha, Rosie, Sakshi
May need to defer detailed discussions to September. (e.g. integrating Emanuele’s Condor exporters - see AOB)
## AOB
* Alessandra suggested reviewing Emanuele’s exporters (e.g. HTCondor) and exploring whether they can be applied in Manchester to better match jobs with machine status.
* Action to revisit this when all data/metrics are available (likely September).