

















# Imperial College London



# System Design and Prototyping for the CMS Level-1 Trigger at the High-Luminosity LHC

Alex Tapper for the CMS Collaboration



The Phase-2 Upgrade of the CMS Level-1 Trigger

CERN-LHCC-2020-004; CMS-TDR-021 <a href="http://cds.cern.ch/record/2714892">http://cds.cern.ch/record/2714892</a>

## Introduction to High-Luminosity LHC







- Initial LHC design luminously 1x10<sup>34</sup> cm<sup>-2</sup> s<sup>-1</sup> → already exceeded by factor 2 in Run 2
- ▶ High-Luminosity era 5-7.5 x10<sup>34</sup> cm<sup>-2</sup> s<sup>-1</sup> → factor of 5 to 7.5 beyond design specification
- Accumulate 3000 4000 fb<sup>-1</sup> → extend physics reach

### **Detector challenges**



- Number of simultaneous protonproton interactions (pileup)
- Design specification ~20 int/bunch crossing
- HL-LHC 140-200 int/bunch crossing
- Higher pileup → higher occupancy, degraded performance (e.g. failure of pattern recognition)

Trigger rates increase with instantaneous luminosity and performance degrades with pileup (e.g. isolation)



| Run period | <b>W</b> → <b>I</b> <sub>V</sub> rate |  |  |
|------------|---------------------------------------|--|--|
| Run1       | 80 Hz                                 |  |  |
| Run 2      | 200 Hz                                |  |  |
| Run 3      | 400-600 Hz                            |  |  |
| HL-LHC     | 1KHz                                  |  |  |

Current L1 trigger 4MHz @ HL-LHC

- Increased particle flux → high radiation dose
- Detector performance degraded → lower response, higher noise

### **CMS** Detector upgrade



- Major upgrade to detector
  - Replacing tracker, end-cap calorimetry, additional muon detectors
  - New trigger and DAQ systems

All silicon tracking system with pixels and silicon strips

Over 200 m<sup>2</sup> of silicon 10<sup>9</sup> channels ~100 µm strips

Outer strip tracker used in L1 trigger: 6 layers in barrel and 4 disks of sensors

Tracker delivers full tracks to L1 trigger for e.g. finding vertex



- P<sub>T</sub>-modules → doublet sensors with common electronics to correlate hits and form stubs for trigger
- Distance between sensors give track p<sub>T</sub> lower cut



- Factor x10 data reduction → control of trigger rates
- FPGA-based track finding @ 40 MHz in 4 μs

### **CMS Detector upgrade**



- Major upgrade to detector
  - Replacing tracker, end-cap calorimetry, additional muon detectors
  - New trigger and DAQ systems

# High Granularity Calorimeter with 4D (space-time) shower measurement

Sampling calorimeter: silicon sensors, optimised for high pileup High granularity readout (~1 cm²) and precision timing (<50ps)





300 GeV pions

- ~600 m<sup>2</sup> of silicon 6M channels
- $\sim$ 100  $\mu m$  strips
- 28 electromagnetic layers (14 for L1 trigger) 22 hadronic layers
- 4 cm<sup>2</sup> trigger granularity

Delivers 3D clusters to L1 trigger latency 4 µs

#### **Technology R&D examples**







ATCA based electronics R&D Generic high I/O processing boards

Wide range of testing and prototypes

e.g. extensive link tests @ 28 Gb/s & thermal cycle testing and simulation





- Xilinx Virtex Ultrascale+ (VU9P) FPGA
- Optical links running up to 28 Gb/s
- Xilinx Zync SoC for control (dual core ARM)
- Option for 128 GB memory for LUT applications





- Carrier board with two sites for daughter cards
- High density, low profile interposer to mount daughter cards with FPGAs
- Optical links running up to 28 Gb/s
- Commercial COM express control with x86 CPU









# Trigger system design



Provides robust independent triggers for **calorimeter**, **muon** and **tracking** systems separately, and a *Particle Flow* trigger, which combines detector information, all feeding into a **global trigger** 

#### **Detector inputs**

| Detector       | Object   | N bits/object | N objects | N bits/BX | Required BW (Gb/s) |
|----------------|----------|---------------|-----------|-----------|--------------------|
| TRK            | Track    | 96            | 1665      | 159 840   | 6 394              |
| EB             | Crystal  | 16            | 61 200    | 979 200   | 39 168             |
| EB             | Clusters | 40            | 50        | 2 000     | 80                 |
| HB             | Tower    | 16            | 2 3 0 4   | 36 864    | 1 475              |
| HF             | Tower    | 10            | 1 440     | 13 824    | 553                |
| HGCAL          | Cluster  | 250           | 416       | 104 000   | 4 160              |
| HGCAL          | Tower    | 16            | 2600      | 41 600    | 1 664              |
| MB DT+RPC (SP) | Stub     | 64            | 1 720     | 110 080   | 4 400              |
| ME CSC         | Stub     | 32            | 1 080     | 34 560    | 1 382              |
| ME RPC         | Cluster  | 16            | 2 3 0 4   | 36 864    | 1 475              |
| ME iRPC        | Cluster  | 24            | 288       | 6912      | 276                |
| ME GEM         | Cluster  | 14            | 2 3 0 4   | 32 256    | 1 290              |
| ME0 GEM        | Stub     | 24            | 288       | 6912      | 276                |
| Total          | -        | -             | -         | -         | 62 593             |

#### System specification and constituents

Increase bandwidth 100 kHz  $\rightarrow$  750 kHz Increase latency 3.8 µs  $\rightarrow$  12.5 µs (9.5 µs target contingency) Include high-granularity detector and tracker information Dedicated **scouting system** @ 40 MHz  $\rightarrow$  streaming data

Optical link speeds 16/25 Gb/s as appropriate for application

Use of largest FPGA parts where processing bound e.g. Xilinx Virtex Ultrascale+ (VU9P/VU13P) and smaller parts where processing is less critical e.g. Xilinx Kintex Ultrascale

Overall over 200 FPGAs

Processing partitioned regionally and in time as appropriate

### Algorithm example: particle flow

 Aim to reconstruct and identify all particles in an event using all sub-detector information



- Efficient reconstruction of charged particles in the tracker, down to threshold of 2 GeV
- Fine granularity calorimetry to resolve the contributions from neighbouring particles
- PUPPI algorithm filters particles
  - Uses vertex to define a particle weight
  - Basically a probability of being prompt
- Ambitious algorithm for Level-1 trigger



Fits in logic resources and meets timing in target FPGA Xilinx Virtex Ultrascale+ (VU9P) for Particle Flow trigger



XILINX®

VIRTEX.



# Algorithm example: machine learning



- Current **global trigger**: possible to apply requirements on correlations between multiple objects (masses,  $\Delta \varphi$ ...)
- Natural continuation: instead of simple 1D cuts on objects and object correlations, use modern ML tools to build more powerful multivariate discriminators
- Software tools to port ML algorithms into FPGA firmware now exist (e.g. <u>hls4ml</u>)
- FPGA resources now allow it

- Proof of principle for VBF Higgs
- L1 design, signal efficiency and rate, feasibility study for firmware
  - Designed DNN with input variables based on jets and missing energy kinematics
  - Three hidden layers with 72 nodes each
  - 4300 multiplications/inference
  - Latency ~0.5 µs DSP usage ~40% in VU9P





#### **Further information**





The Phase-2 Upgrade of the CMS Level-1 Trigger CERN-LHCC-2020-004; CMS-TDR-021 <a href="http://cds.cern.ch/record/2714892">http://cds.cern.ch/record/2714892</a>