A more general definition of retention looks like this:
\[ \text{Retention} = \frac{\text{Users who did action A during } T_0 \text{ and did action B during } T_1}{\text{Users who did action A during } T_0} \]
Where A and B can be any action, and T0 and T1 can be any time period. The most common use of retention metrics that you’re familiar with, when A and B are the same action over different time periods T0 and T1, is just a special case of this more generalized definition.
📖 Related reading: The clear definition of retention, and why yours might be ambiguous
“Feature retention” can describe any case when action A or B is the use of a particular feature instead of the use of the product as a whole.
This gives us more specificity in measuring what parts of your product are most meaningful to driving users to return or churn from your platform and what experiences, in particular, are habitual and durable.
Since retention is so flexible as a concept, creating a particular retention metric involves a lot of context-specific decisions to get a meaningful representation of user behavior.
Measuring retention with action A being an interaction with a specific part of the product can be a powerful way to look specifically at the set of users who engaged with that part of the product. This means you can understand with more granularity what parts of your product are driving users to return or churn.
However, when we choose rarer events for A, we are reducing our sample size in order to get a narrower and more specific set of users that we’re measuring this retention for. The smaller our sample size, the noisier a retention metric based on that set of users will be.
Using a specific feature’s usage as action B can be helpful in understanding the ongoing usage of a particular part of your product. This can highlight parts of the product that users find most useful and worth returning to repeatedly.
However, there are also cases where a feature is not designed to be a habitual surface that a user will return to repeatedly.
In any introductory elements of your product—like a sign-up flow, new user experience (NUX), or tutorial—having high feature-level retention may actually indicate that a user is confused and unable to use other parts of your product effectively.
Similarly, if a settings menu is designed effectively, users should be able to once or rarely tweak settings to suit their preferences and not be a surface that they regularly frequent.
These may be cases where we want to use a less specific return event B to capture returning to the product after these kinds of experiences, but seeing repeated use of the feature is not indicative of value.
Implementing feature retention tracking involves a clear understanding of what user engagement looks like for a given feature in order to define a more or less granular view of retention patterns.
Intuitively, if I see a movie in a movie theater once a week, you’d probably say that I go to the movies a lot. If I check the time on my watch once a week, you’d probably say that I check the time really infrequently. I engage in these activities with the same regularity, which is why context is so important when you are defining active engagement.
For a given feature, is it reasonable for a user to be active if they use it daily? Weekly? Monthly? You might answer this question very differently for different features, as well as your product in general.
In our retention definition, these considerations shape how we define T0 and T1. When sparser activity is expected, it may be reasonable for T0 and T1 to be longer durations.
Once you’ve decided what “active” means, you’ll also want to determine how frequently you want to measure this. These decisions frequently hinge on the tension between specificity and noisiness when confronted with seasonality.
Seasonality occurs when there is a change in behavior at certain time intervals. For example, since Statsig is a product that folks use for work, we typically see much more weekday usage than weekend usage. Holidays are a time when we also see less usage of Statsig.
You’re making a judgment call about whether this seasonality is meaningful to measure or should be abstracted away. For example, day-of-week seasonality can be aggregated away by setting T0 and T1 to be 7 days in duration and measuring retention on a weekly cadence.
However, it won’t always be appropriate to abstract away seasonality by using larger granularity of time measures, especially if a seasonal effect is distinct from other time frames in the context of a business’s strategy (like a holiday season for a consumer product) or the expected activity time scale of a user is smaller than the time scale of the seasonality.
In these cases, it can be helpful to choose T0 and T1 based on an active user's expected usage patterns but compare the retention of users who would be experiencing the same seasonal effects.
For example, a fitness app may see a large cohort of users with New Year’s resolutions activity in early January with distinct behavior from users active at other times. An appropriate retention metric might still be centered around an expectation of daily use, but comparing a retention metric or metrics to last year’s users active in January may be more apt than a comparison than comparing to users active in December.
Instead of abstracting away seasonality, we’ve chosen a comparison point of a year ago with a similar seasonal effect.
In Statsig, you can use Metrics Explorer to create feature retention dashboards for monitoring.
You’re able to select any event for your Start Event and Return Event (A and B, respectively, in our retention definition). You’re also able to select any unit ID, not just looking at the user level. While you don’t have full freedom to pick T0, T1, and data point frequency, you’re able to select whether you want to see a daily or weekly scale of retention.
Standard deviation and variance are essential for understanding data spread, evaluating probabilities, and making informed decisions. Read More ⇾
We’ve expanded our SRM debugging capabilities to allow customers to define custom user dimensions for analysis. Read More ⇾
Detect interaction effects between concurrent A/B tests with Statsig's new feature to ensure accurate experiment results and avoid misleading metric shifts. Read More ⇾
Statsig's biggest year yet: groundbreaking launches, global events, record scaling, and exciting plans for 2025. Explore our 2024 milestones and what’s next! Read More ⇾
A guide to reporting A/B test results: What are common mistakes and how can you make sure to get it right? Read More ⇾
This guide explains why the allocation point may differ from the exposure point, how it happens, and what you to do about it. Read More ⇾