TREASURE: Tokenizing HEP Collider Data for AI Discovery

Name: TREASURE: Tokenizing HEP Collider Data for AI Discovery
Start: 2026-04-27T08:30:00-04:00
End: 2026-04-29T15:30:00-04:00
Location: BNL

27 Apr 2026, 08:30 → 29 Apr 2026, 15:30 US/Eastern

Large Seminar Room (BNL)

Large Seminar Room

BNL

Liza Brost (Brookhaven National Laboratory (US)), Viviana Cavaliere (Brookhaven National Lab)

Description

The TREASURE project (Tokenized Representations for Energy-frontier AI Searches via Understanding and REasoning) is a DOE HEP pilot initiative designed to bridge the gap between experimental physics and frontier AI research. The first workshop at BNL will focus on the technical frameworks required to curate and tokenize data from multiple collider experiments, enabling a new paradigm of cross-experimental discovery.

Key Focus Areas:

Multi-Level Tokenization: Converting complex particle physics events (jets, tracks, and calorimeter hits) into discrete tokens suitable for Transformer-based architectures.
AI-Readiness Protocols: Establishing standards for data curation and metadata that ensure high-fidelity training for Foundation Models.
Physics Benchmarking: Assessing model performance across critical tasks, including pattern recognition, Higgs physics, and new physics searches.
Collaborative Infrastructure: Building the "American Science Cloud" (AmSC) Intelligent Data Activities to support scalable AI research across the national lab complex.

BNL Event Code of Conduct

National Lab Contacts:

Paolo Calafiura (LBNL)
Viviana Cavaliere (BNL)
Walter Hopkins (ANL)
Michael Kagan (SLAC)
Kevin Pedro (FNAL)

Registration

Participants

26 View full list

Monday 27 April
- 08:30 → 09:00
  
  Registration and Coffee 30m
- 09:00 → 10:45
  AmSC Context & Integration: TREASURE and friends
  - 09:00
    
    Welcome and TREASURE context 20m
    
    Speaker: Viviana Cavaliere (Brookhaven National Lab)
  - 09:25
    
    Connections to Nuclear Physics 30m
  - 10:05
    
    FM4NPP 30m
    
    Speaker: Yi Huang
- 10:45 → 11:00
  
  Coffee Break 15m
- 11:00 → 12:30
  AmSC Context & Integration
  - 11:00
    
    American Science Cloud 45m
    
    Speaker: Debbie Bard (LBNL)
  - 11:45
    
    Data Broker and Standards (ModCon) 45m
    
    Speaker: Laura Biven (JLab)
- 12:30 → 13:30
  
  Working Lunch 1h
- 13:30 → 15:20
  Planning: Open Data
  - 13:30
    
    LHC Open Data: Discussion on Status & Opportunities 30m
    
    Speaker: Beojan Stanislaus (Lawrence Berkeley National Lab. (US))
  - 14:05
    
    Planning for Low-Level Detector Data 30m
  - 14:40
    
    Produce Work Plan on Open Data 30m
    
    (all)
- 15:20 → 15:35
  
  Coffee Break 15m
- 15:35 → 17:05
  Planning: Future Activities
  - 15:35
    
    Planning for Additional Datasets (incl. Tevatron) 30m
  - 16:05
    
    Where to store tokenized data? 30m
  - 16:35
    
    Discussion 30m
- 18:00 → 19:30
  
  Group Dinner 1h 30m
  
  not hosted
Tuesday 28 April
- 09:00 → 10:45
  Topical presentations from TREASURE people
  - 09:00
    
    CMS (jets) 20m
    
    Speaker: Oz Amram (Fermi National Accelerator Lab. (US))
  - 09:25
    
    ATLAS (jets) 20m
    
    Speaker: Jeffrey Krupa (SLAC)
  - 09:50
    
    Tracking needs 10m
    
    Speaker: Punit Sharma (Brookhaven National Laboratory (US))
  - 10:05
    
    Event-level Tokenization 20m
- 10:45 → 11:00
  
  Coffee Break 15m
- 11:00 → 12:40
  Topical presentations from TREASURE people
  - 11:00
    
    Belle II 10m
  - 11:15
    
    FCC-ee 10m
  - 11:30
    
    Self-Supervised Learning 10m
    
    Speaker: Ho Fung Tsoi (University of Pennsylvania (US))
  - 11:45
    
    Discussion 45m
- 12:30 → 13:30
  
  Working Lunch 1h
- 13:30 → 15:00
  
  Working time
- 15:00 → 15:30
  
  Coffee Break 30m
- 15:30 → 16:30
  
  BNL Physics Colloquium "An Open Future for ATLAS, HEP, and Global Software and Computing" 1h
  
  Zach Marshall (LBNL) https://indico.bnl.gov/event/31454/
- 16:30 → 17:00
  
  Working time
Wednesday 29 April
- 09:00 → 10:45
  Foundation Model Architecture & Training
  - 09:00
    
    Foundation Model Architecture & Training 30m
    
    Speaker: Michael Kagan (SLAC National Accelerator Laboratory (US))
  - 09:45
    
    Benchmark Tasks & Evaluation Metrics 30m
- 10:45 → 11:00
  
  Coffee Break 15m
- 11:00 → 12:30
  Foundation Model Architecture & Training
  - 11:00
    
    Resource Needs and Connections to Industry 30m
- 12:30 → 13:30
  
  Working Lunch 1h
- 13:30 → 15:00
  
  Check in on Deliverables and Define Next Steps
  
  report from each work team
- 15:00 → 15:15
  
  Workshop Concludes 15m

Choose timezone

TREASURE: Tokenizing HEP Collider Data for AI Discovery

Large Seminar Room

BNL