Track Your ML Experiments. A guide to Neptune for tracking your… | by Haden Pelletier | Mar, 2024
Every data scientist is familiar with experimentation.
You know the drill. You get a dataset, load it into a Jupyter notebook, explore it, preprocess the data, fit a baseline model or two, and then train an initial final model, such as XGBoost. The first time around, maybe you don’t tune the hyperparameters and include 20 features. Then, you check your error metrics.
They look okay, but perhaps your model is overfitting a bit. So you decide to tune some regularization parameters (eg max depth) to reduce the complexity of the model and run it again.
You see a little improvement from the last run, but perhaps you want to also:
- Add more features
- Perform feature selection and remove some features
- Try a different scaler for your features
- Tune different/more hyperparameters
As the different kinds of tests you want to run increases, the more difficult it is to remember which combinations of your “experiments” actually yielded the best results. You can only run a notebook so many times, print out the results, and copy/paste them to a Google doc before you get frustrated.
This is where experiment tracking comes in.
As I mentioned in my article about becoming a great data scientist, having a formal way to track your experiments will make your life a lot easier and your results much clearer.
In this article I’ll be walking you through how to set up an experiment using Neptune.ai, which allows you to run experiments on 1 project for free and will allow you to get familiar with the process. There are plenty of other great experiment tracking tools out there, but since I’m the most familiar with Neptune that’s the one I’ll be basing this guide off of. This is not promotional in any way — I just want to showcase what experiment tracking looks like in Python and am using Neptune as my tool of choice.
After you’ve pip installed Neptune and set up your Jupyter notebook environment, you’ll need to link your notebook to the…