Last month, I hosted Dylan Lewis, Experimentation Leader at Atlassian, for a virtual fireside chat on building the culture of experimentation. Dylan brings over two decades' worth of experience in the domain and had a lot of great anecdotes to share.Â
Back in 2005, when Dylan was working at Intuit-TurboTax as the first Data Analyst on their web team, they had a learning window during tax season, from January through April.
This period essentially provided them one quarter to try out ideas, learn as much as they could, and help customers.
Leadership proposed ideas each Monday morning. The team would then build and launch those experiments by Friday and review the early results the following Monday.Â
The outcome at the end of the tax season was revealing:
Out of the 40 experiments they ran, 38 didn’t win. Side note: The two winning experiments came from marketers. ;)
The Highest Paid Opinion (HIPO) was not always correct.Â
The customers—the ones actually using the product and experiencing the treatment variants—helped them understand what would ultimately succeed.
Dylan shared, “The term HIPO was modified to 'HIPPO'. Avinash Kaushik presented it at an Emetrics conference, and Ronny Kohavi published this.” It has since become commonplace in the vocabulary of experimentation. Dylan noted that these symbols added a lot of fun and excitement.Â
“We loved it, and as teams began experimenting, we sent a (stuffed) hippo to the team with a winning experiment for that week. It moved from one place to another depending on which team was winning, and they got to decorate it. By the end of tax season, the hippo would be covered in souvenirs from the teams.”Â
It didn't stop with the hippos; they also introduced skunks, awarded for experiments that didn't win. Engineers would write the experiment ID on the skunks, giving them to people whose experiments didn't achieve 100% success. By the end of the tax season, engineers would have collected plenty of skunks—proudly displayed on their tables in intricate dioramas!Â
Now at Atlassian, Dylan is working to scale a mature experimentation program. Modern-day experimentation platforms have become more robust in terms of metric trustworthiness and statistical capabilities, enabling greater experimentation velocity.
Yet Dylan noted that culture remains the biggest challenge for most organizations. A good example Dylan shared, highlighting how culture can make a difference, concerned one of the key metrics on his dashboard: the percentage of failed/restarted experiments—a figure that should be low ideally.
One of their experimentation teams was experiencing a 40% restart rate. To address this, they organized a launch party, during which the experiment was made available to those in the room. This process allowed them to verify if the experience worked as expected.
One of the critical factors for success here was including someone who wasn’t part of the experiment to ensure an unbiased perspective.Â
The results were impactful, reducing the percentage of restarts to 5%.
Our conversation was filled with valuable takeaways for operationalizing the culture of experimentation, focusing on themes around identifying roadblocks, conducting reviews, prioritization, and ensuring trustworthiness.
This fireside chat is one you won’t want to miss! Watch below. 👇
The Statsig <> Azure AI Integration is a powerful solution for configuring, measuring, and optimizing AI applications. Read More ⇾
Take an inside look at how we built Statsig, and why we handle assignment the way we do. Read More ⇾
Learn the takeaways from Ron Kohavi's presentation at Significance Summit wherein he discussed the challenges of experimentation and how to overcome them. Read More ⇾
Learn how the iconic t-test adapts to real-world A/B testing challenges and discover when alternatives might deliver better results for your experiments. Read More ⇾
See how we’re making support faster, smarter, and more personal for every user by automating what we can, and leveraging real, human help from our engineers. Read More ⇾
Marketing platforms offer basic A/B testing, but their analysis tools fall short. Here's how Statsig helps you bridge the gap and unlock deeper insights. Read More ⇾