TREASURE: Tokenizing HEP Collider Data for AI Discovery

US/Eastern
Large Seminar Room (BNL)

Large Seminar Room

BNL

Liza Brost (Brookhaven National Laboratory (US)), Viviana Cavaliere (Brookhaven National Lab)
Description

The TREASURE project (Tokenized Representations for Energy-frontier AI Searches via Understanding and REasoning) is a DOE HEP pilot initiative designed to bridge the gap between experimental physics and frontier AI research. The first workshop at BNL will focus on the technical frameworks required to curate and tokenize data from multiple collider experiments, enabling a new paradigm of cross-experimental discovery.

Key Focus Areas:

  • Multi-Level Tokenization: Converting complex particle physics events (jets, tracks, and calorimeter hits) into discrete tokens suitable for Transformer-based architectures.

  • AI-Readiness Protocols: Establishing standards for data curation and metadata that ensure high-fidelity training for Foundation Models.

  • Physics Benchmarking: Assessing model performance across critical tasks, including pattern recognition, Higgs physics, and new physics searches.

  • Collaborative Infrastructure: Building the "American Science Cloud" (AmSC) Intelligent Data Activities to support scalable AI research across the national lab complex.

 

BNL Event Code of Conduct

National Lab Contacts:

Paolo Calafiura (LBNL)
Viviana Cavaliere (BNL)
Walter Hopkins (ANL)
Michael Kagan (SLAC)
Kevin Pedro (FNAL)

 

Treasure playlist

 

Participants
    • 8:30 AM
      Registration and Coffee
    • AmSC Context & Integration: TREASURE and friends
      • 1
        Welcome and TREASURE context
        Speaker: Viviana Cavaliere (Brookhaven National Lab)
      • 2
        Connections to Nuclear Physics
        Speaker: Jamie Dunlop
      • 3
        FM4NPP
        Speaker: Yi Huang
    • 10:45 AM
      Coffee Break
    • AmSC Context & Integration
      • 4
        American Science Cloud
        Speaker: Debbie Bard (LBNL)
      • 5
        Data Broker and Standards (ModCon)
        Speaker: Laura Biven (JLab)
    • 12:30 PM
      Working Lunch
    • Planning: Open Data
      • 6
        LHC Open Data: Discussion on Status & Opportunities
        Speaker: Beojan Stanislaus (Lawrence Berkeley National Lab. (US))
      • 7
        Planning for Low-Level Detector Data
      • 8
        Produce Work Plan on Open Data

        (all)

    • 3:20 PM
      Coffee Break
    • Planning: Future Activities
      • 9
        Where to store tokenized data?
        Speaker: Scarlet Rachel Norberg (Fermi National Accelerator Lab. (US))
      • 10
        Event-level tokenization discussion
        Speaker: Viviana Cavaliere (Brookhaven National Lab)
      • 11
        Discussion
    • 6:30 PM
      Dinner and golf at Top Golf

      not hosted

    • Topical presentations
      • 12
        CMS (jets)
        Speaker: Oz Amram (Fermi National Accelerator Lab. (US))
      • 13
        ATLAS (jets)
        Speaker: Jeffrey Krupa (SLAC)
      • 14
        Tracking needs
        Speaker: Punit Sharma (Brookhaven National Laboratory (US))
      • 15
        AI for operations
        Speaker: Walter Hopkins (Argonne National Laboratory (US))
    • 10:45 AM
      Coffee Break
    • Foundation Models and AI in Trigger
      • 16
        The future of Foundation Models
        Speaker: Michael Kagan (SLAC National Accelerator Laboratory (US))
      • 17
        Ai in Trigger
        Speaker: Jennifer Ngadiuba (Fermi National Accelerator Lab. (US))
    • 12:30 PM
      Working Lunch
    • Working time
    • 3:00 PM
      Coffee Break
    • 3:30 PM
      BNL Physics Colloquium "An Open Future for ATLAS, HEP, and Global Software and Computing"

      Zach Marshall (LBNL) https://indico.bnl.gov/event/31454/

    • Working time
    • Topical presentations
      • 18
        Belle 2
        Speaker: Riccardo Manfredi (Brookhaven National Laboratory (US))
      • 19
        FCC-ee
        Speaker: Andrea Sciandra (Brookhaven National Laboratory (US))
      • 20
        Self supervised learning
        Speaker: Ho Fung Tsoi (University of Pennsylvania (US))
      • 21
        Compression
        Speaker: Antonio Boveia (Ohio State University)
      • 22
        Discussion
    • 10:30 AM
      Coffee Break
    • Working time
    • 12:00 PM
      Working Lunch
    • Check in on Deliverables and Define Next Steps

      report from each work team

    • 2:30 PM
      Workshop Concludes