7–10 Oct 2025
Inn at Penn, University of Pennsylvania
US/Eastern timezone

PQuant: Streamlining ML Model Compression to Deployment for Next-Gen Detector Systems

9 Oct 2025, 15:20
20m
St Marks

St Marks

Parallel session talk RDC 5 Trigger & DAQ SHARED SESSION II

Speaker

Arghya Ranjan Das (Purdue University (US))

Description

Real-time machine learning is emerging as a key tool for next-generation detector systems, where strict latency and hardware constraints require highly efficient models. We present PQuant, a backend-agnostic Python library designed to unify and streamline pruning and quantization techniques for hardware deployment, supporting both PyTorch and TensorFlow. PQuant provides a comprehensive suite of methods, including unstructured pruning, structured pruning (PDP and ActivationPruning), and hardware-aware resource (DSP/BRAM) optimization, pattern compression for convolutional kernels, in FPGAs/ASICSs, through MDMM framework. PQuant also provides flexible quantization options, ranging from fixed-point to high-granularity schemes, with per-layer or per-weight bit control. Integration with hls4ml is ongoing, enabling compressed models to be deployed directly to FPGAs/ASICs. PQuant bridges advanced compression methods with implementation directly translating to resource optimization, providing a practical path to low-latency ML in triggers, DAQ, and online reconstruction for high-energy physics experiments.

Author

Co-authors

Chang Sun (California Institute of Technology (US)) Anastasiia Petrovych (CERN) Dimitrios Danopoulos (CERN) Enrico Lupi (CERN, INFN Padova (IT)) Arghya Ranjan Das (Purdue University (US)) Sebastian Dittmeier (Ruprecht-Karls-Universitaet Heidelberg (DE)) Michael Kagan (SLAC National Accelerator Laboratory (US)) Miaoyuan Liu (Purdue University (US)) Vladimir Loncar (CERN)

Presentation materials