Group Testing Metrics

Understanding group testing metrics

Group testing metrics are essential tools for monitoring and evaluating the performance of A/B tests. These metrics provide valuable insights into the safety and effectiveness of your experiments, helping you make data-driven decisions. By carefully selecting and tracking the right group testing metrics, you can ensure that your A/B tests are delivering the desired results without negatively impacting your users or business.

There are three main types of group testing metrics: goal metrics, guardrail metrics, and secondary metrics. Goal metrics are the primary metrics used to measure the success of your experiment, aligning with your overall business objectives and experiment hypothesis. For example, if you're testing a new checkout flow, your goal metric might be the conversion rate.

Guardrail metrics, on the other hand, are used to monitor for any negative impacts on critical business metrics. These metrics act as an early warning system, alerting you to potential issues that could harm your product or user experience. Examples of guardrail metrics include revenue, user retention, and customer satisfaction scores.

Secondary metrics are additional metrics that provide a more comprehensive view of your experiment's impact. While not the primary focus, these metrics can offer valuable insights into user behavior and help you better understand the effects of your A/B test. Examples of secondary metrics include engagement rates, time spent on site, and user feedback.

By carefully selecting and monitoring a combination of goal, guardrail, and secondary metrics, you can gain a holistic understanding of your A/B test's performance and make informed decisions based on the data.

Secondary metrics

  • Additional metrics tracked to gain deeper insights into experiment impact

  • Include user experience, engagement, and revenue metrics

  • Provide a more holistic view of the experiment's effects

Deterioration metrics

  • Subset of secondary metrics monitored for significant negative impact

  • Inferiority tests applied to detect if treatment is worse than control

  • Crucial for identifying regressions that could undermine experiment success

Quality metrics

  • Metrics that validate the experiment's quality and integrity

  • Examples: sample ratio mismatch tests, pre-exposure bias tests

  • Ensure the experiment results are reliable and trustworthy

Designing a comprehensive experimentation strategy

To create a robust group testing metrics framework, follow these steps:

  1. Define your decision rule:

    • Treatment must be superior on at least one success metric

    • Treatment must be non-inferior on all guardrail metrics

    • No success, guardrail, or deterioration metrics should show significant deterioration

    • Quality tests should not invalidate the experiment's integrity

  2. Determine the number of metrics:

    • Let S = number of success metrics

    • Let G = number of guardrail metrics

    • Let D = additional deterioration metrics

    • Let Q = number of quality tests

  3. Set your risk levels:

    • α = intended false-positive rate for the overall decision

    • β = intended false-negative rate for the overall decision

    • γ = intended false-positive rate for deterioration and quality tests

  4. Apply corrections to control false-positive and false-negative risks:

    • Use γ for all deterioration and quality tests

    • Use α/S for superiority tests on success metrics

    • Use α/G for non-inferiority tests on guardrail metrics

    • Use β for all non-inferiority and superiority tests

By following this approach, you can design group testing metrics that align with your goals, monitor potential negative impacts, and ensure the reliability of your experimentation results. This comprehensive strategy empowers you to make data-driven decisions with confidence, optimizing your product for success.

Choosing effective metrics

Align metrics with business goals and user journey stages. This ensures you track indicators that matter for overall success. Consider metrics reflecting the impact on specific user journey stages, from acquisition to retention.

Balance primary and secondary metrics. Identify a main success indicator while supplementing with secondary metrics for a comprehensive view. Examples of primary metrics include revenue and conversion rate; secondary metrics may include click-through rate and average session duration.

Setting up metrics in experimentation platforms

Define and configure metrics in your chosen platform. Platforms like Eppo allow you to design and track crucial metrics for your group testing. This includes revenue, profit margins, conversion rate, and more.

Integrate your experimentation platform with data sources for accurate tracking. Eppo integrates with your existing data sources, bringing all group testing metrics into a single platform. This enables monitoring metrics over time and analyzing across user segments.

Analyzing and acting on results

Regularly review experiment results and metrics. Dedicate time to analyze the impact of your group tests on key metrics. Identify winning variations and areas for improvement.

Make data-driven decisions based on group testing insights. Use the evidence from your experiments to inform product decisions with confidence. Implement winning variations and continue iterating based on data.

Monitor guardrail metrics closely. Keep a watchful eye on your guardrail metrics throughout the group testing process. If metrics hit concerning thresholds, pause tests and investigate potential issues.

Join the #1 experimentation community

Connect with like-minded product leaders, data scientists, and engineers to share the latest in product experimentation.

Try Statsig Today

Get started for free. Add your whole team!

Why the best build with us

OpenAI OpenAI
Brex Brex
Notion Notion
SoundCloud SoundCloud
Ancestry Ancestry
At OpenAI, we want to iterate as fast as possible. Statsig enables us to grow, scale, and learn efficiently. Integrating experimentation with product analytics and feature flagging has been crucial for quickly understanding and addressing our users' top priorities.
OpenAI
Dave Cummings
Engineering Manager, ChatGPT
Brex's mission is to help businesses move fast. Statsig is now helping our engineers move fast. It has been a game changer to automate the manual lift typical to running experiments and has helped product teams ship the right features to their users quickly.
Brex
Karandeep Anand
President
At Notion, we're continuously learning what our users value and want every team to run experiments to learn more. It’s also critical to maintain speed as a habit. Statsig's experimentation platform enables both this speed and learning for us.
Notion
Mengying Li
Data Science Manager
We evaluated Optimizely, LaunchDarkly, Split, and Eppo, but ultimately selected Statsig due to its comprehensive end-to-end integration. We wanted a complete solution rather than a partial one, including everything from the stats engine to data ingestion.
SoundCloud
Don Browning
SVP, Data & Platform Engineering
We only had so many analysts. Statsig provided the necessary tools to remove the bottleneck. I know that we are able to impact our key business metrics in a positive way with Statsig. We are definitely heading in the right direction with Statsig.
Ancestry
Partha Sarathi
Director of Engineering
We use cookies to ensure you get the best experience on our website.
Privacy Policy