TREASURE: Tokenizing HEP Collider Data for AI Discovery
Large Seminar Room
BNL

The TREASURE project (Tokenized Representations for Energy-frontier AI Searches via Understanding and REasoning) is a DOE HEP pilot initiative designed to bridge the gap between experimental physics and frontier AI research. The first workshop at BNL will focus on the technical frameworks required to curate and tokenize data from multiple collider experiments, enabling a new paradigm of cross-experimental discovery.
Key Focus Areas:
-
Multi-Level Tokenization: Converting complex particle physics events (jets, tracks, and calorimeter hits) into discrete tokens suitable for Transformer-based architectures.
-
AI-Readiness Protocols: Establishing standards for data curation and metadata that ensure high-fidelity training for Foundation Models.
-
Physics Benchmarking: Assessing model performance across critical tasks, including pattern recognition, Higgs physics, and new physics searches.
-
Collaborative Infrastructure: Building the "American Science Cloud" (AmSC) Intelligent Data Activities to support scalable AI research across the national lab complex.
National Lab Contacts:
Paolo Calafiura (LBNL)
Viviana Cavaliere (BNL)
Michael Kagan (SLAC)
Walter Hopkins (ANL)
Kevin Pedro (FNAL)