Speaker
Description
Providing a practical and hadron-level definition of multiple jet flavors is a long-standing problem in collider physics. Previous work has introduced a data-driven, operational definition of quark and gluon jets, but no generalization to multiple jet flavors presently exists. To address this, we introduce a practical machine-learning framework to extract any number of flavors from any number of data samples with minimal constraints. Intuitively, our procedure identifies the maximally separable categories in the data, also known as topics in the statistics literature. We demonstrate that our procedure infers the truth-level fractions of up-quark, down-quark, and gluon jets from various combinations of three samples. Then, we propose a tag-and-probe technique to extract multiple light flavors at colliders. Our findings show that the identifiability of jet flavors depends on their relative abundance in the samples and the hadron-level information available to the classifier architecture. Our work opens the door to searches for multiple jet flavors in experimental data, and it enables studies of the sample dependence of jet tagging.