Platform

Resources

Docs Blog Pricing

Platform

Resources

Platform

Resources

What is A/B testing and why is it important?

Thu Sep 05 2024

AB testing is the most reliable way to get evidence.

Underlying AB testing is the concept of “randomized controlled trials (RCTs).” It is the gold standard in finding causality.

Below is the famous hierarchy of evidences pyramid. Essentially, the only form of evidence that is stronger than RCTs is a meta study of RCTs. Presenting an RCT in an argument settles the argument.

There are two technical insights that enables the power of RCTs

With a large enough sample, randomization cancels out biases – this is called the law of large numbers. This makes sure that we don’t need to care about differences in the observable and unobservable variables with a large sample – randomization will take care of it.
With randomized assignments, the difference between the treatment group and the control group is caused by the treatment.

“Caused by the treatment” is a super strong statement. In most comparisons, studies without RCTs, the difference between two groups is usually a result of the selection bias instead of the treatment.

Let’s use one quick example, which also illustrates what “random assignment” is and its importance.

Understanding treatment effect with an example

Suppose I claim that I have a magic pill that costs $100 and can increase the height of high school students by 1 inch over a year. I will show you two true results from my study:

Test group: 1000 students who voluntarily took the pill a year ago. Their average height was 60 inches a year ago and 62 inches this year.
Control group: 1000 students from the same schools with the same age. Their average height was 60 inches a year ago and 61 inches this year.

Can we conclude that this pill is effective? We all know that such a magic pill doesn’t exist, but what’s the loophole in this study?

The loophole in this study is “selection bias.” People are (self) selected into the treatment group. Those who volunteer into the study may come from wealthier families, as they can afford the pill, or they are more eager to grow taller and may have tried other things besides taking the pill. Any such factor will destroy the causality in this study.

But if we have 2000 students, then assign the pill randomly, we remove the select bias. By the law of large numbers, the average metrics (height, wealth, growth of height, eagerness to grow, etc.) of these two groups should be the same, and the difference in their height growth is guaranteed to be caused by the treatment – the pill.

Selection biases in product development

Taking this example to product development, we can see why we can make such mistakes every day if we don’t have the mindset of AB testing. For example

Selection bias in time series:

Claim: We shipped a feature and metrics increased 10%
Reality: The metrics will increase 10% without the feature, such as shipping a Black Friday banner before Black Friday.

Selection bias in cross sections:

Claim: We shipped a feature, and users who use the feature saw 10% increase in their metrics
Reality: The users who self-select into using the feature would see a 10% increase without the feature, such as giving a button to power users (ref: why most aha moments are wrong?)

Request a demo

Statsig's experts are on standby to answer any questions about experimentation at your organization.

Grab a Demo

AB testing is a powerful measurement

Beyond causality, AB testing is also a powerful measurement too. Peter Drucker said “If you can’t measure it, you can’t change it.” This is especially true in large companies with lots of management frictions.

Our customer story with Recroom is a great example. The company did a great UI revamp but saw a 30%+ decrease in their key metric. Without AB testing, they wouldn’t have noticed it.

Product development is not a one time work. It is a continuous iteration that accumulates small wins. But you can’t win if you can’t measure wins against losses. Once people start doing AB testing, they found out that 70% - 90% of their ideas actually don’t work.

Consequently, people who don’t do AB testing will ship many bad ideas without knowing it.

In short, AB testing is powerful and important because

Humans are bad at attributions and are subject to lots of biases
Humans are bad at predicting the outcome of their ideas

AB testing provides the necessary measurement and causality and keeps us honest with reality.

Talk A/B testing with the pros

Have more questions about A/B testing and wish there was someone to talk to? Fret not—our team is on standby to give you the information you need.

Let's connect 🤝

Permalink: https://www.statsig.com/blog/what-is-a-b-testing-and-why-is-it-important

Platform

Resources

Platform

Resources

Docs

Blog

Pricing

Back to Blog home

Yuzheng Sun, PhD

What is A/B testing and why is it important?

AB testing is the most reliable way to get evidence.

Understanding treatment effect with an example

Selection biases in product development

Request a demo

AB testing is a powerful measurement

Talk A/B testing with the pros

Recent Posts

Continuous promotion for infrastructure with Statsig and Pulumi

Jason Wang

Product Growth Forum 2025: Building for the future

Morgan Scalzo

Addressing complexity in enterprise-scale experimentation

Yuzheng Sun, PhD

How to use AI to enhance your experiments

Yuzheng Sun, PhD

Release pipelines: Safer, staged rollouts across your infrastructure

Shubham Singhal, Sid Kumar

Escaping SDK maintenance hell with a core Rust engine

Jina Yoon, Tore Hanssen, Daniel Loomb