Clustered Standard Errors in AB Tests | by Matteo Courthoud

What to do when the unit of observation differs from the unit of randomization

13 min read

10 hours ago

A/B tests are the golden standard of causal inference because they allow us to make valid causal statements under minimal assumptions, thanks to randomization. In fact, by randomly assigning a treatment (a drug, ad, product, …), we are able to compare the outcome of interest (a disease, firm revenue, customer satisfaction, …) across subjects (patients, users, customers, …) and attribute the average difference in outcomes to the causal effect of the treatment.

Sometimes it happens that the unit of treatment assignment differs from the unit of observation. In other words, we do not take the decision on whether to treat every single observation independently, but rather in groups. For example, we might decide to treat all customers in a certain region while observing outcomes at the customer level, or treat all articles of a certain brand, while observing outcomes at the article level. Usually this happens because of practical constraints. In the first example, the so-called geo-experiments, it happens because we are unable to track users because of cookie deprecations.

When this happens, treatment effects are not independent across observations anymore. In fact, if a customer in a region is treated, also other customers in the same region will be treated. If an article of a brand is not treated, also other articles of the same brand will not be treated. When doing inference, we have to take this dependence into account: standard errors, confidence intervals, and p-values should be adjusted. In this article, we will explore how to do that using cluster-robust standard errors.

Imagine you were an online platform and you were interested in increasing sales. You just had a great idea: showing a carousel of related articles at checkout to incentivize customers to add other articles to their basket. In order to understand whether the carousel increases sales, you decide to AB test it. In principle, you could just decide for every order whether to display the carousel or not, at random. However, this would give…

Clustered Standard Errors in AB Tests | by Matteo Courthoud | Mar, 2024

What to do when the unit of observation differs from the unit of randomization

Related Post

You Missed

Spotify set to increase prices and add more plans soon

Oura preempts Samsung’s Galaxy Ring with new features for its rings

14 Years of iPad, and “Trucks” Continue To Dominate

Apple Vision Pro Spatial Personas Hands-On: A Step-Change For Telepresence