30 June 2024 to 4 July 2024
FMDUL
Europe/Lisbon timezone

Parallel CPU and GPU-based connected component algorithms for event building for hybrid pixel detectors

3 Jul 2024, 14:12
1m
Main Auditorium (FMDUL)

Main Auditorium

FMDUL

Main Auditorium of the Faculty of Dental Medicine at the University of Lisbon (Faculdade de Medicina Dentária da Universidade de Lisboa)

Speaker

Tomas Celko (Czech Technical University in Prague (CZ))

Description

Parallel CPU and GPU-based connected component algorithms for event building for hybrid pixel detectors

Tomáš Čelko, František Mráz, Benedikt Bergmann, Petr Mánek

Abstract:
Introducing the Timepix3 [6] hybrid pixel detector significantly improved particle tracking with its high spatial and temporal resolution. However, its high pixel-hit rate posed challenges for processing software [4]. This will be further enhanced by multidetector Timepix3 setups and increased hit rate capability of the next generation Timepix4 detectors [3]. Evidently, storing all pixel hits individually and processing them “offline” can be inefficient and space-intensive for such high data rates. Before being able to characterize individual particle events seen in the sensor, the pixel hits partly unsorted in both time and across the matrix must be first grouped into temporally and spatially coincident groups called “clusters”. While further track analysis usually requires simple, computationally inexpensive, and fast calculations or look-up tables, the current bottleneck is fast clustering.
In the present work, we explore parallel approaches to building the clusters online, which offers the potential for online data reduction and filtering. First, we attempt to use multiple CPU cores for real-time clustering. Despite the temporal interdependence of the clusters, we achieved data throughput scaling with the number of available cores. However, due to high CPU occupancy, we faced load-balancing issues between processing and I/O, occasionally resulting in data loss.
Additionally, we propose a new highly parallel connected component labeling algorithm for pseudo-real-time processing based on a union-find data structure [2] with path compression [7]. In contrast to similar parallel connected component algorithms [1][5], our approach exploits the zero suppression data encoding.
Experimentally, the GPU parallel implementation outperformed existing CPU-based algorithms, achieving throughputs of 60 to 80 million hits per second excluding I/O, more than 20× speedup compared to a similar existing single-threaded CPU implementation (see Figures 1 & 2 attached). Moreover, offloading clustering to the GPU freed the CPU for I/O handling, minimizing the data transfer loss.
Acknowledgements:
B.B. and P.M. profited from funding from the Czech Science Foundation (GACR) under grant number GM23-04869M. The work was also supported by the Charles University Grant Agency (GAUK) under project number GAUK-142424.
References
[1] Stefano Allegretti, Federico Bolelli, and Costantino Grana. “Optimized Block-Based Algorithms to Label Connected Components on GPUs”. In: IEEE Transactions on Parallel and Distributed Systems 31.2 (2020), pp. 423–438. doi: 10.1109/TPDS.2019.
2934683.
[2] Bernard A. Galler and Michael J. Fisher. “An improved equivalence algorithm”. In: Commun. ACM 7.5 (May 1964), pp. 301–303. issn: 0001-0782. doi: 10.1145/364099. 364331. url: https://doi.org/10.1145/364099.364331.
[3] Xavier Llopart et al. “Timepix4, a large area pixel detector readout chip which can be tiled on 4 sides providing sub-200 ps timestamp binning”. In: Journal of Instrumentation 17.01 (Jan. 2022), p. C01044. issn: 1748-0221. doi: 10.1088/1748-0221/
17/01/C01044. url: https://iopscience.iop.org/article/10.1088/17480221/17/01/C01044 (visited on 07/07/2023).
[4] Lukáš Meduna et al. Real-time Timepix3 data clustering, visualization and classification with a new Clusterer framework. 2019. arXiv: 1910.13356[physics.ins-det].
[5] Daniel Peter Playne and Ken Hawick. “A New Algorithm for Parallel ConnectedComponent Labelling on GPUs”. In: IEEE Transactions on Parallel and Distributed Systems 29.6 (2018), pp. 1217–1230. doi: 10.1109/TPDS.2018.2799216.
[6] Tuomas Poikela et al. “Timepix3: a 65K channel hybrid pixel readout chip with simultaneous ToA/ToT and sparse readout”. In: Journal of Instrumentation 9.05 (May 2014), pp. C05013–C05013. issn: 1748-0221. doi: 10.1088/1748-0221/9/05/C05013. url: https://iopscience.iop.org/article/10.1088/1748-0221/9/05/C05013 (visited on 07/07/2023).
[7] Raimund Seidel and Micha Sharir. “Top-Down Analysis of Path Compression”. In: SIAM Journal on Computing 34.3 (2005), pp. 515–525. doi: 10 . 1137 / S0097539703439088. eprint: https://doi.org/10.1137/S0097539703439088. url: https://doi.org/10.1137/S0097539703439088.

Author

Tomas Celko (Czech Technical University in Prague (CZ))

Co-authors

Benedikt Bergmann František Mráz (Charles University) Mr Petr Manek (Czech Technical University)

Presentation materials