Statistical Significance

Statistical significance is a term used in hypothesis testing to refer to the likelihood that the relationship between two or more variables is caused by something other than chance. Statistical hypothesis testing is a method of making decisions using data, whether from a controlled experiment or an observational study (not controlled).

In hypothesis testing, a p-value is used to determine statistical significance. The p-value measures the probability of the metric lift you observe (or a more extreme lift) assuming that the variant you’re testing has no effect. The standard is to use a p-value less than 0.05 to identify variants that have a statistically significant effect. A p-value less than 0.05 implies that there’s less than 5% chance of seeing the observed metric lift (or a more extreme metric lift) if the variant had no effect. In practice, a p-value that's lower than your pre-defined threshold is treated as evidence for there being a true effect.

Another approach to assess statistical significance is using a confidence interval. A confidence interval examines whether the metric difference between the variant and control overlaps with zero. A 95% confidence interval is the range that covers the true difference 95% of the time. It is usually centered around the observed delta between the variant and control.

For example, if you are testing the effect of a new website design on the amount of time visitors spend on your website, you would compare the time spent by visitors who saw the new design (the treatment group) to the time spent by visitors who saw the old design (the control group). If the p-value is less than 0.05, you might conclude that the new design leads to a statistically significant increase in time spent on the website.

However, it's important to note that statistical significance does not always imply practical significance. A result might be statistically significant but the actual difference between the groups might be so small that it doesn't have any practical implications.

Also, when running multiple tests, the chance of observing a statistically significant result just by chance increases. This is known as the problem of multiple comparisons, and it is often addressed by adjusting the significance level using methods like the Bonferroni correction. The Bonferroni correction reduces the probability of Type I errors (false positives) by adjusting the significance level (α). The significance level is divided by the number of comparisons (equivalent to the number of test variants) in the experiment.

Join the #1 experimentation community

Connect with like-minded product leaders, data scientists, and engineers to share the latest in product experimentation.

Try Statsig Today

Get started for free. Add your whole team!

Why the best build with us

OpenAI OpenAI
Brex Brex
Notion Notion
SoundCloud SoundCloud
Ancestry Ancestry
At OpenAI, we want to iterate as fast as possible. Statsig enables us to grow, scale, and learn efficiently. Integrating experimentation with product analytics and feature flagging has been crucial for quickly understanding and addressing our users' top priorities.
OpenAI
Dave Cummings
Engineering Manager, ChatGPT
Brex's mission is to help businesses move fast. Statsig is now helping our engineers move fast. It has been a game changer to automate the manual lift typical to running experiments and has helped product teams ship the right features to their users quickly.
Brex
Karandeep Anand
President
At Notion, we're continuously learning what our users value and want every team to run experiments to learn more. It’s also critical to maintain speed as a habit. Statsig's experimentation platform enables both this speed and learning for us.
Notion
Mengying Li
Data Science Manager
We evaluated Optimizely, LaunchDarkly, Split, and Eppo, but ultimately selected Statsig due to its comprehensive end-to-end integration. We wanted a complete solution rather than a partial one, including everything from the stats engine to data ingestion.
SoundCloud
Don Browning
SVP, Data & Platform Engineering
We only had so many analysts. Statsig provided the necessary tools to remove the bottleneck. I know that we are able to impact our key business metrics in a positive way with Statsig. We are definitely heading in the right direction with Statsig.
Ancestry
Partha Sarathi
Director of Engineering
We use cookies to ensure you get the best experience on our website.
Privacy Policy