You probably understand that whether you're testing a new feature, a marketing campaign, or a business process, the success of your experiment hinges on how well you measure and understand the results. This is why you’re searching for information on how to build an experiment scorecard—it’s the medium of reading the results of your experiment.
In this blog, we'll explore the general principles of creating effective experiment scorecards, delve into best practices, and discuss some general key metrics that should be on your radar. Initially, I’ll do my best to avoid Statsig-specific solutions and will focus on the governing principles.
Hopefully by the end, you'll have a solid foundation to apply these concepts to a wide range of experimentation scenarios. And for those curious about how technology can streamline this process, we'll include a bonus section at the end diving into how Statsig's scorecard approaches these challenges.
An experiment scorecard is more than just a collection of numbers; it's a narrative of your experiment's journey from hypothesis to conclusion.
To construct a scorecard that truly reflects the impact of your experiment, consider the following elements:
Before you can measure anything, you need to know what you're aiming for; a hypothesis. Define the goals of your experiment in clear, quantifiable terms. Are you looking to increase user engagement, boost sales, or reduce churn? Your objectives will shape the metrics you choose to track. New experimentation solutions usually provide user inputs for experiment objectives and context.
I also see a lot of customers do this in Google Docs, Notion or Jira.
Related resources:
Statsig's Experimentation Review Template
Optimizely's Experiment Plan and Results Template for Confluence
Choose metrics that directly relate to your objectives. These should be indicators that will move the needle on your goals. For example, if your objective is to improve user retention, metrics like daily active users (DAU) and churn rate are more relevant than page views.
🤖👉 Our article Picking Metrics 101 walks through some thoughts on how to approach picking metrics for an experiment.
To understand the impact of your experiment, you need a reference point. Establish baselines by measuring your metrics before the experiment begins. Set targets for what you consider success, whether it's a 5% increase in conversion rate or a 10% reduction in support tickets.
Related reading: A tale of three companies: Why you don’t need millions of users to succeed in experimentation.
I think everyone understands that it’s important to identify the KPIs that will give you the quickest insight into whether your experiment is on track.
Also, I think everyone understands that these should be metrics that respond rapidly to changes and can be monitored in real-time or near-real-time. What I see people fail to execute on is the “monitored in real-time or near-real-time” piece of it. This is a critical factor in any broad adoption of any experiment scorecard.
Whether it’s a custom script, an experimentation tool, or a homegrown system, if it doesn’t deliver automated results at least daily, you need to set very clear expectations with the scorecard consumer on when they can read the results to keep their trust.
Lagging indicators are delayed events or metrics, which can help measure long-term trends.
A balanced scorecard considers lagging indicators, providing a comprehensive view of your experiment's performance. Ultimately, this depends on your type of business, but I’ll share an anecdote with you:
During a POC in the early days of Statsig, a customer wanted to test search functionality in their e-com marketplace. Another vendor suggested measuring immediate clicks (Which I presume was because it was an easier path to implementation?) which favored variant A for its higher click rate. We recommended incorporating the conversion data, which ultimately landed in their data warehouse an hour or so after purchase, into the experiment scorecard metrics.
By analyzing both immediate and delayed outcomes, Statsig revealed that variant B, despite fewer clicks, led to higher purchase rates and customer lifetime value. We learned that variant A's increased clicks were not due to better search results but because users had to search more to find what they wanted.
Creating an effective scorecard is an iterative process. Here are some best practices to guide you:
Avoid cluttering your scorecard with too many metrics. Focus on a handful of measures that truly reflect the success of your experiment. This makes it easier to draw meaningful conclusions and take action. Avoid irrelevant metrics; these will mostly distract experiment scorecard consumers.
Kinda obvious but… use charts, graphs, and color-coding to make your data easy to digest. Visual representations can help stakeholders quickly grasp the results and implications of your experiment. A lot of people I talk to prefer Bayesian in part for this reason; it’s easier to communicate what to make of the results.
An experiment is a living process, and your scorecard should be too. Review it regularly to ensure it remains aligned with your objectives, and don't be afraid to adjust as you learn more about what's working and what isn't.
Share your scorecard with all relevant stakeholders. Clear communication ensures that everyone understands the goals, progress, and outcomes of the experiment.
Use your scorecard not just as a report card but as a learning tool. Analyze the results to understand why certain changes occurred and how you can apply these learnings to future experiments.
While the specific metrics you track will depend on your experiment's objectives, here are some relatively universal measures I see experimenters use:
Conversion rate: The percentage of users who take a desired action.
User engagement: Metrics like session duration, pages per session, or feature usage.
Revenue metrics: Sales, average order value, or lifetime value.
Customer satisfaction: Net promoter score (NPS), customer satisfaction score (CSAT), or support ticket trends.
Operational efficiency: Time to complete a process, error rates, or cost savings.
For those seeking a streamlined approach to experiment scorecards, Statsig offers a robust solution. Statsig's scorecard is designed to simplify the measurement process, providing real-time insights and advanced statistical treatments like CUPED (Controlled-Experiment Using Pre Experiment Data) to reduce variance and pre-exposure bias.
With Statsig, you can easily define objectives, select relevant metrics from your metrics catalog, and visualize your data with intuitive dashboards.
The platform encourages collaboration and learning, making it easier to share results and insights across teams.
By integrating best practices and essential metrics into its scorecard, Statsig provides a powerful tool for companies looking to foster a culture of experimentation, shared learning and data-driven decision-making.
Documentation: Statsig's experimentation scorecard
By adhering to best practices and focusing on essential metrics, you can ensure that your experiments deliver actionable insights and drive meaningful improvements.
Whether you're a startup or a Fortune 500 company, the principles outlined in this blog can help you measure what matters and turn experiments into strategic victories. And with solutions like Statsig's scorecard, you can harness the power of technology to make the process more efficient and effective.
Ready to take your experimentation to the next level? Embrace the art of the scorecard. 🤙
Standard deviation and variance are essential for understanding data spread, evaluating probabilities, and making informed decisions. Read More ⇾
We’ve expanded our SRM debugging capabilities to allow customers to define custom user dimensions for analysis. Read More ⇾
Detect interaction effects between concurrent A/B tests with Statsig's new feature to ensure accurate experiment results and avoid misleading metric shifts. Read More ⇾
Statsig's biggest year yet: groundbreaking launches, global events, record scaling, and exciting plans for 2025. Explore our 2024 milestones and what’s next! Read More ⇾
A guide to reporting A/B test results: What are common mistakes and how can you make sure to get it right? Read More ⇾
This guide explains why the allocation point may differ from the exposure point, how it happens, and what you to do about it. Read More ⇾