How to track your features' retention

Fri May 17 2024

Liz Obermaier

Data Scientist, Statsig

Retention is a broad and flexible suite of possible metrics, not just “the percent of users who returned to your product after x days.”

A more general definition of retention looks like this:

\[ \text{Retention} = \frac{\text{Users who did action A during } T_0 \text{ and did action B during } T_1}{\text{Users who did action A during } T_0} \]

Where A and B can be any action, and T0 and T1 can be any time period. The most common use of retention metrics that you’re familiar with, when A and B are the same action over different time periods T0 and T1, is just a special case of this more generalized definition.

📖 Related reading: The clear definition of retention, and why yours might be ambiguous

“Feature retention” can describe any case when action A or B is the use of a particular feature instead of the use of the product as a whole.

This gives us more specificity in measuring what parts of your product are most meaningful to driving users to return or churn from your platform and what experiences, in particular, are habitual and durable.

Choosing appropriate A, B, T0, and T1

Since retention is so flexible as a concept, creating a particular retention metric involves a lot of context-specific decisions to get a meaningful representation of user behavior.

The specificity vs sample size trade-off (choosing A)

Measuring retention with action A being an interaction with a specific part of the product can be a powerful way to look specifically at the set of users who engaged with that part of the product. This means you can understand with more granularity what parts of your product are driving users to return or churn.

However, when we choose rarer events for A, we are reducing our sample size in order to get a narrower and more specific set of users that we’re measuring this retention for. The smaller our sample size, the noisier a retention metric based on that set of users will be.

When repeated feature usage is more/less meaningful (choosing B)

Using a specific feature’s usage as action B can be helpful in understanding the ongoing usage of a particular part of your product. This can highlight parts of the product that users find most useful and worth returning to repeatedly.

However, there are also cases where a feature is not designed to be a habitual surface that a user will return to repeatedly.

In any introductory elements of your product—like a sign-up flow, new user experience (NUX), or tutorial—having high feature-level retention may actually indicate that a user is confused and unable to use other parts of your product effectively.

Similarly, if a settings menu is designed effectively, users should be able to once or rarely tweak settings to suit their preferences and not be a surface that they regularly frequent.

These may be cases where we want to use a less specific return event B to capture returning to the product after these kinds of experiences, but seeing repeated use of the feature is not indicative of value.

Evaluating useful time ranges (choosing T0, T1, and how many retention data points to generate)

Implementing feature retention tracking involves a clear understanding of what user engagement looks like for a given feature in order to define a more or less granular view of retention patterns.

Intuitively, if I see a movie in a movie theater once a week, you’d probably say that I go to the movies a lot. If I check the time on my watch once a week, you’d probably say that I check the time really infrequently. I engage in these activities with the same regularity, which is why context is so important when you are defining active engagement.

For a given feature, is it reasonable for a user to be active if they use it daily? Weekly? Monthly? You might answer this question very differently for different features, as well as your product in general.

In our retention definition, these considerations shape how we define T0 and T1. When sparser activity is expected, it may be reasonable for T0 and T1 to be longer durations.

Once you’ve decided what “active” means, you’ll also want to determine how frequently you want to measure this. These decisions frequently hinge on the tension between specificity and noisiness when confronted with seasonality.

Seasonality occurs when there is a change in behavior at certain time intervals. For example, since Statsig is a product that folks use for work, we typically see much more weekday usage than weekend usage. Holidays are a time when we also see less usage of Statsig.

You’re making a judgment call about whether this seasonality is meaningful to measure or should be abstracted away. For example, day-of-week seasonality can be aggregated away by setting T0 and T1 to be 7 days in duration and measuring retention on a weekly cadence.

However, it won’t always be appropriate to abstract away seasonality by using larger granularity of time measures, especially if a seasonal effect is distinct from other time frames in the context of a business’s strategy (like a holiday season for a consumer product) or the expected activity time scale of a user is smaller than the time scale of the seasonality.

In these cases, it can be helpful to choose T0 and T1 based on an active user's expected usage patterns but compare the retention of users who would be experiencing the same seasonal effects.

For example, a fitness app may see a large cohort of users with New Year’s resolutions activity in early January with distinct behavior from users active at other times. An appropriate retention metric might still be centered around an expectation of daily use, but comparing a retention metric or metrics to last year’s users active in January may be more apt than a comparison than comparing to users active in December.

Instead of abstracting away seasonality, we’ve chosen a comparison point of a year ago with a similar seasonal effect.

Using Metrics Explorer on Statsig to track feature retention

a screenshot of the metrics explorer in statsig

In Statsig, you can use Metrics Explorer to create feature retention dashboards for monitoring.

You’re able to select any event for your Start Event and Return Event (A and B, respectively, in our retention definition). You’re also able to select any unit ID, not just looking at the user level. While you don’t have full freedom to pick T0, T1, and data point frequency, you’re able to select whether you want to see a daily or weekly scale of retention.

Get started now!

Get started for free. Add your whole team!
an enter key that says "free account"

Build fast?

Subscribe to Scaling Down: Our newsletter on building at startup-speed.

Try Statsig Today

Get started for free. Add your whole team!
We use cookies to ensure you get the best experience on our website.
Privacy Policy