Speaker
Description
How close are modern jet taggers to the fundamental statistical limit of their task? Generative models with tractable likelihoods offer a route to this question: treated as surrogates for the data distribution, they enable the construction of Neyman-Pearson-optimal classifiers. The resulting bound, however, is only as reliable as the surrogate. In this talk, I introduce the SUrrogate ReFerence (SURF) method to test that reliability. SURF trains a candidate generative model on samples from a second, tractable surrogate and checks whether the candidate recovers the known optimal classifier of that reference. Applying SURF to top tagging, we find that different generative surrogates can yield substantially different estimates of the statistical limit, and identify specific models that fail this consistency test. I will discuss possible explanations and what this implies for interpreting existing claims about fundamental tagging limits.