CatBoost: Gradient Tree Boosting for Recommender Systems, Classification and Regression | by Rafael Guedes | Feb, 2024

Build your own book recommender with CatBoost Ranker

In today’s digital world, where information overload and wide product offer is the norm, being able to help customers find what they need and like can be an important factor to make our company stand out and get ahead of the competition.

Recommender systems can enhance digital experiences facilitating the search for relevant information or products. At their core, these systems leverage data-driven algorithms to analyze user preferences, behaviors, and interactions, transforming raw data into meaningful recommendations tailored to individual tastes and preferences.

In this article, I provide a detailed explanation of how Gradient Tree Boosting works for classification, regression and recommender systems. I also introduce CatBoost, a state-of-art library for Gradient Tree Boosting, and how it handles categorical features. Finally, I explain how YetiRank (a ranking loss function) works and how to implement it using CatBoost Ranker in a book recommender dataset.

Figure 1: Recommending Books with Gradient Tree Boosting (image generated by the author with DALL-E)

As always, the code is available on Github.

The idea of boosting relies on the hypothesis that a combination of sequential weak learners can be as good or even better than a strong learner [1]. A weak learner is an algorithm whose performance is at least slightly better than a random choice and, in case of Gradient Tree Boosting, the weak learner is a Decision Tree. These weak learners in a boosting set up are trained to handle more complex observations that the previous one could not solve. In this way, the new weak learners can focus on developing themselves on more complex patterns.

AdaBoost

The first boosting algorithm with great success for binary classification was AdaBoost [2]. The weak learner in AdaBoost is a decision tree with a single split and, it works by putting more weight on observations that are more complex to classify. The new weak learner is added sequentially to focus its training on more complex patterns. The final prediction is made by majority vote…