TREASURE: Tokenizing HEP Collider Data for AI Discovery

US/Eastern
Large Seminar Room (BNL)

Large Seminar Room

BNL

Liza Brost (Brookhaven National Laboratory (US)), Viviana Cavaliere (Brookhaven National Lab)
Description

Theย TREASURE project (Tokenized Representations for Energy-frontier AI Searches via Understanding and REasoning) is a DOE HEP pilot initiative designed to bridge the gap between experimental physics and frontier AI research. The first workshop at BNL will focus on the technical frameworks required to curate and tokenize data from multiple collider experiments, enabling a new paradigm of cross-experimental discovery.

Key Focus Areas:

  • Multi-Level Tokenization: Converting complex particle physics events (jets, tracks, and calorimeter hits) into discrete tokens suitable for Transformer-based architectures.

  • AI-Readiness Protocols: Establishing standards for data curation and metadata that ensure high-fidelity training for Foundation Models.

  • Physics Benchmarking: Assessing model performance across critical tasks, including pattern recognition, Higgs physics, and new physics searches.

  • Collaborative Infrastructure: Building the "American Science Cloud" (AmSC) Intelligent Data Activities to support scalable AI research across the national lab complex.

ย 

BNL Event Code of Conduct

National Lab Contacts:

Paolo Calafiura (LBNL)
Viviana Cavaliere (BNL)
Walter Hopkins (ANL)
Michael Kagan (SLAC)
Kevin Pedro (FNAL)

ย 

Treasure playlist:ย 

ย 
Participants
Surveys
TREASURE Workshop Survey
    • 08:30 09:00
      Registration and Coffee 30m
    • 09:00 10:45
      AmSC Context & Integration: TREASURE and friends
    • 10:45 11:00
      Coffee Break 15m
    • 11:00 12:30
      AmSC Context & Integration
    • 12:30 13:30
      Group Photo ๐Ÿ“ธ + Working Lunch 1h
    • 13:20 15:20
      Planning: Open Data
      Convener: Beojan Stanislaus (Lawrence Berkeley National Lab. (US))
      • 13:30
        Status reports on pre-PCDF Open Data release 30m

        (all)

        Speakers: Abhijith Gandrakota (Fermi National Accelerator Lab. (US)), Beojan Stanislaus (Lawrence Berkeley National Lab. (US)), Oz Amram (Fermi National Accelerator Lab. (US))
      • 14:00
        LHC Open Data: Discussion on Status & Opportunities 30m
        Speakers: Beojan Stanislaus (Lawrence Berkeley National Lab. (US)), Matthew Bellis (Cornell University (US) / Siena University (US)), Zach Marshall (Lawrence Berkeley National Lab. (US))
      • 14:30
        Planning for Low-Level Detector Data 30m
    • 15:20 15:35
      Coffee Break 15m
    • 15:35 17:35
      Planning: Future Activities
    • 18:30 20:30
      Dinner and golf at Top Golf 2h

      not hosted

    • 08:30 09:00
      Breakfast 30m
    • 09:00 10:45
      Topical presentations
    • 10:45 11:00
      Coffee Break 15m
    • 11:00 12:30
      Foundation Models and AI in Trigger
    • 12:30 13:30
      Working Lunch 1h
    • 13:30 15:00
      Working time
    • 15:00 15:30
      Coffee Break 30m
    • 15:30 16:30
      BNL Physics Colloquium "An Open Future for ATLAS, HEP, and Global Software and Computing" 1h

      Zach Marshall (LBNL) https://indico.bnl.gov/event/31454/

    • 16:30 17:00
      Working time
    • 08:30 09:00
      Breakfast 30m
    • 09:00 10:30
      Topical presentations
    • 10:30 10:45
      Coffee Break 15m
    • 10:45 12:00
      Working time
    • 12:00 13:00
      Working Lunch 1h
    • 13:00 14:30
      Check in on Deliverables and Define Next Steps

      report from each work team

      • 13:00
        Closeout 1h
        Speaker: Viviana Cavaliere (Brookhaven National Lab)
    • 14:30 14:45
      Workshop Concludes 15m