Question Regarding ML in Practice

I had a question about using Machine Learning in practice. When selecting a model, of course we should take into consideration which models have an advantage over others for the given problem we are seeking to solve. But, I was wondering if, similar to our second project, researchers typically perform tests with a wide array of models regardless, and select based off of the highest performer. Could this lead to selection bias and overfitting?

Yes, in practice we test a wide range of models. If you keep using the same validation set data to select a model and your hyperparameters it would lead to overfitting. As a modeller it is your job not to overfit training data, validation data, or test data.

All the validation methods that we investigated like expanding window TS-validation and combinatorial purging are all methods to ensure we don’t overfit the data.

Some models are also high bias low variance, so they are less likely to overfit (like linear regression model or regularization models). For deep learning in particular you can perform drop-out, and l-1/l-2 regularization.

One way of limiting overfitting validation data is by measuring the performance on multiple slices of data (2007-2008, 2009-2010, 2010-2011), and you can even shift three/six/nine months both ways to develop new splits.

We can also protect against overfitting by not searching across all the hypermarket space, but instead do it randomly over a small selection.

Does this make sense, if not feel free to re-phrase or ask the question again.