We recently hosted a virtual meetup featuring Allon Korem, CEO of Bell, and Ronny Kohavi, a widely respected thought leader and expert in experimentation.
The virtual meetup was conducted in Hebrew, so for those who aren’t fluent in the language, we have provided a summary of their conversation below. The conversation focused on three key areas essential to effective A/B testing in organizations:
Infrastructure: Building the foundation for effective experimentation is crucial. This involves setting up the right tools, processes, and systems that enable teams to run experiments seamlessly.
Experimentation culture: Establishing an organizational culture that embraces experimentation is vital for continuous growth and improvement. This involves encouraging teams to take calculated risks, learn from outcomes, and use data to drive decisions.
Learning from failures: One of the critical aspects of successful experimentation is the ability to learn from failures. Analyzing failed experiments and understanding the root causes can provide valuable insights that lead to better decision-making and future success.
Ronny shared several insights from his experience, especially during his tenure at Microsoft:
Experience at Microsoft: He highlighted that one of the biggest factors for faster shipping was having better PMs who could effectively manage and prioritize experiments and features.
Success rate of ideas: Only 33% of ideas were successful, and overcoming the cultural challenge of accepting this low success rate was crucial. Organizations must understand that failure is an integral part of the experimentation process.
Crawl, walk, run, fly model: Organizations can be at different levels of maturity in their experimentation journey. It's essential to recognize where you are and focus on progressing rather than aiming to achieve the highest level immediately.
Balancing short- and long-term goals: Optimizing solely for revenue can be short-sighted. It is important to balance short-term wins with long-term strategic goals to ensure sustainable growth.
Ronny emphasized that not every organization needs to aim for the "Fly" level in every category. For smaller organizations, the focus should be on making consistent progress rather than reaching the highest level right away.
Large organizations may have the resources to achieve "Fly" in multiple areas, but this is not feasible for everyone.
They also addressed common misconceptions and pitfalls in experimentation:
Misinterpreting p-values: A common mistake is interpreting p-values as the probability of success. For example, a p-value of 0.01 does not mean there is a 99% chance of success. Understanding the correct interpretation of statistical data is crucial for making informed decisions.
Case study: Only 12% of 1,000 experiments succeeded, illustrating the harsh reality that many organizations must accept—a high failure rate is normal.
Building features with a high failure expectation: Given the low success rate, it's advisable to build features on a small scale first, such as for one platform (desktop, Android, or iOS), and then expand based on positive results.
The discussion highlighted that A/B testing is widely regarded as the most scientific method for experimentation in Israel, with many companies testing every feature they release. However, the focus should not only be on defensive testing (non-inferiority testing), which is used to ensure no metric is harmed beyond a threshold.
While useful in certain situations, such as code refactoring, this approach is not generally recommended for growth-driven experimentation.
Ronny discussed the issue of sample ratio mismatch (SRM), where a very low p-value (below 1/1000) might indicate unreliable results. Potential causes include bot traffic, and simply rerunning the experiment is unlikely to fix SRM issues. It's crucial to investigate the underlying causes to ensure accurate and reliable results.
Ronny encouraged organizations to use experimentation tools to supplement existing processes. Setting up effective experimentation requires a significant investment, and having the right tools can streamline the process and improve outcomes.
During the Q&A session, several important topics were discussed:
Bayesian vs. frequentist approaches: They covered the differences between these two statistical approaches and their applications in experimentation.
Multivariable vs. multivariate testing: Allon mentioned that most companies don't engage in multivariable testing. Ronny added that multivariable testing is more suitable for offline, one-time scenarios, whereas multivariate testing is better for quickly testing multiple variables in a live environment.
A/A tests as best practice: Running A/A tests is strongly recommended to validate experimentation setups and ensure the reliability of testing infrastructure.
Asymmetric allocation: This approach can provide advantages to the larger group in an experiment, optimizing for specific outcomes.
The discussion provided a comprehensive overview of best practices and insights for effective A/B testing and experimentation. By embracing a culture of experimentation, understanding the value of learning from failures, and leveraging the right tools and methodologies, organizations can optimize their decision-making processes and drive continuous growth.
Catch the full conversation in Hebrew, below:
Standard deviation and variance are essential for understanding data spread, evaluating probabilities, and making informed decisions. Read More ⇾
We’ve expanded our SRM debugging capabilities to allow customers to define custom user dimensions for analysis. Read More ⇾
Detect interaction effects between concurrent A/B tests with Statsig's new feature to ensure accurate experiment results and avoid misleading metric shifts. Read More ⇾
Statsig's biggest year yet: groundbreaking launches, global events, record scaling, and exciting plans for 2025. Explore our 2024 milestones and what’s next! Read More ⇾
A guide to reporting A/B test results: What are common mistakes and how can you make sure to get it right? Read More ⇾
This guide explains why the allocation point may differ from the exposure point, how it happens, and what you to do about it. Read More ⇾