(1) Issues surrounding training: (a) When is it appropriate to use different training schemes, e.g. n-fold cross validation versus 3-way split, etc. (b) correct way to deal with very rare backgrounds in the training, where a cross-section corrected weighting will give a handful of events only (e.g. triboson and Higgs backgrounds) (2) Dealing with unbalanced samples - comparisons of different ML algorithms are often shown based on equally sized “signal” and “background” samples, but in real life one usually has much more of the latter than the former. How should this be dealt with in the training? Do we need to reconsider the statistics we request for signal processes (see the next point) (3) Strategy for MC production requests when use of machine learning is planned - this is linked to (1) and (2) (4) Hyperparameter tuning - what is recommended here? Grid searches, random searches etc? Probably strongly coupled with (1a)... (5) Variable selection - what is the best way of choosing which variables to keep and which to drop? One-at-a-time? BDT/NN variable importance measures? Studying the variation of the output w.r.t. each variable? Etc. What is the role of “really understanding/reproducing” variables in order that they play a decisive/important role? (6) Dealing with variables that are undefined for some events - quite often one will encounter “-999” etc in an n-tuple for cases where the variables isn’t defined, e.g. “pT of the 5th jet” etc. What should one do with these cases? Some machine learning literature advocates replacing them with random noise or the mean of the other variables in the column, but this doesn’t seem appropriate for our field. One could transform them to categorical variables that are always defined? Or something else? This would seem to be particularly relevant to neural networks (BDTs can handle missing variables much more smoothly). (7) Memory issues - I guess this differs from case to case, but quite often one has a very large dataset that it isn’t practical to load into memory at once. Some algorithms are more amenable to training in batches than others (my experience is that it is easy for neural networks but less obvious for ensemble methods like BDTs). (8) Is there some kind of recommended prescription for pre-selecting events before they are passed to a multivariate algorithm?