The Dummy Models of Scikit-learn. Always keep a dummy by your side. | by Yoann Mocquin | Feb, 2024
If you like or want to learn machine learning with scikit-learn, check out my tutorial series on this amazing package:
Sklearn tutorial
All images by author.
Dummy models are very simplistic models that are meant to be used as a baseline to compare your actual models. A baseline is just some kind of reference point to compare yourself to. When you compute your first cross-validation results to estimate your model’s performance, you usually know that the higher the score the better, and if the score is pretty high on the first try, that’s great. But it isn’t usually the case.
What to do if the first accuracy score is pretty low — or lower than what you’d want or expect? Is it because of the data? Is it because of your model? Both? How can we know quickly if our model isn’t badly tuned?
Dummy models are here to answer these questions. Their complexity and “intelligence” are very low: the idea is that you can compare your models to them to see how much better you are than the “stupidest” models. Note that they do not intentionally predict stupid values, they just take the easiest, very simplistic smart guess. If you model gives worst performance than the dummy model, you should tune or change your model completely.
A simple example for a dummy regressor would be to always predict the mean value of the training target, whatever the input: it’s not ideal, but on average it gives a reasonable simplistic guess. If your actual model gives worse results than this very, very simple approach, you might want to review your model.