TREASURE: Tokenizing HEP Collider Data for AI Discovery

US/Eastern
Large Seminar Room (BNL)

Large Seminar Room

BNL

Liza Brost (Brookhaven National Laboratory (US)), Viviana Cavaliere (Brookhaven National Lab)
Description

The TREASURE project (Tokenized Representations for Energy-frontier AI Searches via Understanding and REasoning) is a DOE HEP pilot initiative designed to bridge the gap between experimental physics and frontier AI research. The first workshop at BNL will focus on the technical frameworks required to curate and tokenize data from multiple collider experiments, enabling a new paradigm of cross-experimental discovery.

Key Focus Areas:

  • Multi-Level Tokenization: Converting complex particle physics events (jets, tracks, and calorimeter hits) into discrete tokens suitable for Transformer-based architectures.

  • AI-Readiness Protocols: Establishing standards for data curation and metadata that ensure high-fidelity training for Foundation Models.

  • Physics Benchmarking: Assessing model performance across critical tasks, including pattern recognition, Higgs physics, and new physics searches.

  • Collaborative Infrastructure: Building the "American Science Cloud" (AmSC) Intelligent Data Activities to support scalable AI research across the national lab complex.

 

National Lab Contacts:

Paolo Calafiura (LBNL)
Viviana Cavaliere (BNL)
Michael Kagan (SLAC)
Walter Hopkins (ANL)
Kevin Pedro (FNAL)

Registration
Register for the first TREASURE workshop at BNL!
Participants
    • 08:30 09:00
      Registration and Coffee 30m
    • 09:00 10:30
      AmSC Context & Integration
      • 09:00
        Welcome 15m
      • 09:30
        Genesis Context 1h
    • 10:30 10:45
      Coffee Break 15m
    • 10:45 12:30
      AmSC Context & Integration: TREASURE and friends
      • 10:45
        TREASURE 45m
      • 11:30
        Connections to NP 1h
    • 12:30 13:30
      Working Lunch 1h
    • 13:30 15:30
      Planning: Future Activities
      • 13:30
        Planning for Low-Level Detector Data 30m
      • 14:00
        Planning for Additional Datasets 30m
      • 14:30
        Discussion 30m
    • 15:30 15:45
      Coffee Break 15m
    • 15:45 17:00
      Planning: Open Data
      • 15:45
        LHC Open Data: Status & Opportunities 45m
    • 17:30 19:00
      Group Dinner 1h 30m

      not hosted

    • 09:00 10:45
      Tokenization Standards
      • 09:00
        Multi-Level Tokenization Standards 1h 30m
    • 10:45 11:00
      Coffee Break 15m
    • 11:00 12:30
      Tokenization Standards: Objects
      • 11:00
        Jets / Calorimeter Objects 45m
      • 11:45
        Other Objects 45m
    • 12:30 13:30
      Working Lunch 1h
    • 13:30 15:00
      Working time
    • 15:00 15:30
      Coffee Break 30m
    • 15:30 16:30
      Physics Colloquium (HL-LHC / Future S&C and Open Data/Science) 1h
    • 16:30 17:00
      Working time
    • 09:00 10:45
      Foundation Model Architecture & Training
      • 09:00
        Foundation Model Architecture & Training 30m
      • 09:45
        Benchmark Tasks & Evaluation Metrics 30m
    • 10:45 11:00
      Coffee Break 15m
    • 11:00 12:30
      Foundation Model Architecture & Training
      • 11:00
        Task Applications 30m
    • 12:30 13:30
      Working Lunch 1h
    • 13:30 15:00
      Working time
    • 15:00 15:15
      Workshop Concludes 15m