This is how to use XGBoost in a forecasting scenario, from theory to practice

Image made by author using DALL·E-3

A couple of months ago, I was on a research project and I had a problem to solve involving time series.

The problem was fairly straightforward:

“Starting from this time series with t timesteps, predict the next k values

For the Machine Learning enthusiasts out there, this is like writing “Hello World”, as this problem is extremely well known to the community with the name “forecasting”.

The Machine Learning community developed many techniques that can be used to predict the next values of a timeseries. Some traditional methods involve algorithms like ARIMA/SARIMA or Fourier Transform analysis, and other more complex algorithms are the Convolutional/Recurrent Neural Networks or the super famous “Transformer” one (the T in ChatGPT stands for transformers).

While the problem of forecasting is a very well-known one, it is maybe less rare to address the problem of forecasting with constraints.
Let me explain what I mean.

You have a time series with a set of parameters X and the time step t.
The standard time forecasting problem is the following:

Image made by the author

The problem that we face is the following:

Image made by the author

So, if we consider that the input parameter has d dimensions, I want the function for dimension 1 (for example) to be monotonic. So how do we deal with that? How do we forecast a “monotonic” time series? The way that we are going to describe in this problem is the XGBoost.

The structure of this blogpost is the following:

  1. About the XGBoost: In a few lines we will describe what the XGBoost is about, what is the fundamental idea, and what are the pros and cons.