A/B testing helps you create the best version of a product tailored for your customers. E-commerce applications are inherently primed to implement A/B tests because these teams running these applications are already heavily metrics-driven and track conversion at every point. Yet, more e-commerce customers ask us every day, âWhat do we test?â
Letâs set the basic context with the most common metrics in e-commerce and then get into some playbooks on what to test.
The most common metrics for e-commerce businesses are conversions, average order value, frequency of purchase, customer lifetime value, and customer acquisition cost.
Conversions span the entire customer journey, from search to cart, cart to checkout, checkout to purchase, and first purchase to repeat purchase
Average order value (AOV) is a function of converting customer interest to intent, say through personalized recommendations, featured deals and promotions, shipping incentives, and so on
Purchase frequency is a function of a positive customer experience, reinforced by historical familiarity, trust, and loyalty incentives
Average order value and frequency of purchase determine the customer lifetime value (CLV) that sets the high end mark for customer acquisition costs (CAC)
For experiments in e-commerce, conversion rates are often the primary metrics that determine the success or failure of an experiment. A statistically significant improvement in conversion marks the experiment as a good candidate to roll out to all users. This is because (a) conversion is actionable and sensitive to actions that a small team can test, and (b) improving conversion directionally improves output business metrics such as total gross merchandise value (GMV) that arenât as actionable at the team level.
AOV and purchase frequency often serve as guardrail metrics to ensure that the team doesnât over-index on short term conversions instead of long term customer sentiment and purchase behavior. Application performance also provides common guardrail metrics such as page load time or error and crash rates.
Borrowing from Booking.com, the first approach is to validate whether every update to the application has the expected impact. This method of âtesting every atomic changeâ is so effective that Booking.com enjoys conversion rates 2â3x higher than industry average. Stuart Frisby, Director of Design at Booking.com, explains their approach:
Almost every product change is wrapped in a controlled experiment. From entire redesigns and infrastructure changes to the smallest bug fixes, these experiments allow us to develop and iterate on ideas safer and faster by helping us validate that our changes to the product have the expected impact on the user experience. If it can be a test, test it. If we canât test it, we probably donât do it.
Booking.com also runs ânon-inferiority testsâ to identify any regressions in guardrail metrics such as error rates and customer support inquiries. For example, when they introduced the âPrint Receiptâ feature, they ran an A/B test to measure the impact of the new feature on Customer Support calls. The experiment suggested a 0.78% increase, less than the pre-defined threshold of 2%, marking this experiment a success.
The second approach is to set a top-down direction based on an essential, unchanging customer need. As Jeff Bezos said about Amazon.com, âWe donât make money when we sell things. We make money when we help customers to make purchase decisions.â
âWorking backwardsâ from an aspirational vision but staying relentless about course-correcting is a playbook that Amazon has perfected. Perhaps what makes Amazon especially unique is its ability to embrace failure as organizational learning, making the companyâs unique cultural traits heavily path-dependent. Bezos has explained this in some detail:
You really canât accomplish anything important if you arenât stubborn on vision. But you need to be flexible about the details because you gotta be experimental to accomplish anything important, and that means youâre gonna be wrong a lot. Youâre gonna try something on your way to that vision, and thatâs going to be the wrong thing, youâre gonna have to back up, take a course correction, and try again. Most large organizations embrace the idea of invention, but are not willing to suffer the string of failed experiments necessary to get there.
A key aspect of this playbook is to ask whatâs the smallest big step you can take to test the riskiest assumption of your vision. Ideally, this experimental step will generate measurable results that either meaningfully validate your assumption or pointedly surprise you with an insight that changes your assumption. For example, if youâre testing product pricing and assume that customers always prefer lower prices, an experiment may reveal that below a certain price range your customers begin to lose trust in your product. Not surprisingly, there is lot of room to experiment with pricing in e-commerce!
The second level of this playbook is to recognize behavioral characteristics that help users achieve their objectives. In the example below, adding a customer testimonial improved credibility with the users and increased conversion rate by 35%.
The third level of this playbook includes tactical steps to remove unwanted friction. Any action that requires the user to slow down adds a point of friction. If it doesnât add value to the user at some point, itâs unwanted friction. In the example below, reducing input fields to only whatâs necessary (and adding security certification with improved button copy) increased the revenue per order by 56%.
Poor presentation of information can also add unwanted friction. Hereâs an example where structuring product information and highlighting a single CTA increased conversion rate by 58%.
Removing unwanted friction is an ongoing, iterative effort. One of the best books that have helped me identify and address unwanted friction in e-commerce applications is Donât Make Me Think by Steve Krug. Itâs a short and delightful read!
The third approach focuses on growth. For example, Pinterestâs dedicated growth team focuses on conversion, turning prospective users into active users. To improve conversion, they come up with ideas for improvements, use experiments to measure the change, and analyze results before rolling out the change to all users.
Pinterest initially set up a bottom-up approach where individual team members were tasked with coming up with new ideas but found that team members didnât know how to come up with high quality ideas. Their recent Experiment Idea Review (EIR) process now requires team members to actively build the skills for generating high quality ideas and measures their performance based on these ideas.
For example, the EIR process requires team members to clearly outline the problem, hypothesis, opportunity size, and expected impact from their proposed experiment in a document. Team-leads review these documents ahead of a team review to spot any gaps and further flesh out these ideas. After a review, the team green lights promising proposals and adds them to a backlog. With each experiment, the growth team builds more resources and improves their skills to raise the bar for the next set of ideas.
While this is admittedly the least concrete approach, think of it as a meta-approach to build the clock that tells the time rather than simply telling the time when someone asks. Leading by example and hiring thoughtful growth leaders may be the most meaningful takeaways here.
Whatâs best for you depends on your leaders, your organizational culture, and how deeply your organization cares about incorporating data in decision making. At Statsig, we help e-commerce organizations of all sizes bootstrap their experimentation, whether it is in service of their culture, vision, or growth.
But every approach begins with running the first experiment.
If youâre already using feature flags to ship software, the easiest way to run an A/B test is with no additional effortâââsee Statsigâs smart feature gates to kick off an A/B test within minutes.
The good news about getting started is that it automatically generates data that fuels more new ideas for growth and experimentation.
Want to chat more about your e-commerce application and find ideas to experiment in your business? Join the conversation on the Statsig Slack channel.
The Statsig <> Azure AI Integration is a powerful solution for configuring, measuring, and optimizing AI applications. Read More ⇾
Take an inside look at how we built Statsig, and why we handle assignment the way we do. Read More ⇾
Learn the takeaways from Ron Kohavi's presentation at Significance Summit wherein he discussed the challenges of experimentation and how to overcome them. Read More ⇾
Learn how the iconic t-test adapts to real-world A/B testing challenges and discover when alternatives might deliver better results for your experiments. Read More ⇾
See how weâre making support faster, smarter, and more personal for every user by automating what we can, and leveraging real, human help from our engineers. Read More ⇾
Marketing platforms offer basic A/B testing, but their analysis tools fall short. Here's how Statsig helps you bridge the gap and unlock deeper insights. Read More ⇾