The retention module allows you to visualize engagement over time by tracking the number of users that complete one event after another. Retention analysis, also known as cohort analysis, has the potential to answer a multitude of questions related to engagement over time, such as:
- After users sign up for our app, do they continue to visit day by day?
- Are my new users more active than my old users?
To make a retention report, navigate to Analyze > Retention. The drop-down at the top will allow you to choose between Retention Analysis and Usage Interval. To learn how to set up usage interval, see Usage Interval Analysis.
When Retention Analysis option is selected, you’ll be presented with options to select the following:
Start Event: The start event is the event that you’d like to use as the foundation of the retention report; it is often some sort of one-time activation event. Session, Create Account, Upload a Profile picture, and Install App are all good examples of start events, though any defined event can serve as a start event.
Return Event: The return event is generally a repeated action that you want to see over time. Examples of repeated actions might be Login, Visit, or Read Article, though any event can be a return event.
Group By: By default, the group by clause is set to the date of the start event, which means the retention chart will group based on the first time they completed the start event within the date range. The group by clause supports any user-level property available to Heap, as well as behavioral properties. You can also remove the group by clause to see your users’ aggregate retention without cohorts.
Date Range: By default, Heap will set the date range of the retention report to the previous two weeks (14 days). The retention report supports the following date ranges:
- Past 7 Days
- Past 30 Days
- Past 90 Days
- Past Year
- Date to Now
- Choose a Date Range (a custom date range)
Keep in mind that the date range bounds all the numbers presented in the retention report.
Granularity: This value determines the interval by which we view how often a person completes the return event, either by Day, Week, or Month. By default, this value is set to Day. A week or month refers to 7 days or 30 days, rather than a calendar week or a calendar month.
For this example, we will make a Session to Session retention report, grouped by the Date of First Event.
Understanding the Retention Report
By default, retention report results are represented as a line graph, which allows you to see trends at a glance. Hover over any line in the graph to see a percentage of users who completed the return event for each day following the date of the start event, as well as the percentage increase or decrease from the day before.
To switch to the table view, click the table icon in the Chart Type field just above the retention results.
The table view allows you to see a breakdown of activity for each cohort during each rolling window. Each row represents a cohort of users. Each column represents a rolling window from the start event, defined by the granularity.
In our example, we see the Session to Session retention report for the past 7 days by Day. Column 0 is within 24 hours of the user’s first event. Column 1 is 24-48 hours after the user’s first event. Users are only counted once per report, so a user who had their first session in the range on March 26 who came back a day later on March 27 will not be double-counted in that cohort.
Hover over any cell to see a description of the data.
By default, users are counted each time they do the return event. Clicking the First Time checkbox and re-running the query changes the data to count only each user’s first return event.
For an explanation of why the numbers in the first row are so high, see the FAQ Why is the first row in my retention report always so high? at the bottom of this article.
Date Range, Granularity and Retention
Retention is relative to the date range and the granularity chosen. Staying with the above example, if you change the date range to include the previous week, a user whose start event in the range appeared in the March 26 cohort may now appear in an earlier cohort. As you can see below, the number of users in the March 26 cohort has dropped from 4,557 to 2,626.
Let’s change the date range to the past month and granularity to week. Each row now shows a calendar week in which the users did their first start event in the range selected. Each column is now a rolling 7-day window in which the user can do subsequent return events.
Because the date of the return event is relative to the first event, per user, in the graph above a user who did their first event in the week of Mar 24 – Mar 30 has a possible window of Mar 31 – April 6 to do their second event and be counted in column 0. This is an important distinction to remember when analyzing 7-day (weekly) or 30-day (monthly) retention.
Retention analysis is most powerful when you group by cohorts beyond the default Date of Start Event. A cohort is a group of people who share a common characteristic over a period of time. For instance, in the previous example, users who first signed up on the same day make up one cohort of users. Almost any type of user-level property you can imagine can be a cohort. Some examples of cohorts are location, event history, and you can even define cohorts based upon custom properties sent to Heap via our Custom Identify API.
Heap only counts unique users in cohorts, not the total number of events or sessions. One person is only counted once in a given cohort and is in only one cohort (row) per retention report. However, an individual user is not limited to one cell within a row – if a person repeats a return event many times over time, these actions are reflected in multiple columns.
The example below answers the question “how does a particular user activity affect retention?” In this case, we want to know if users who upload a picture to our app are better retained over time than those who don’t. Grouping by this property creates two cohorts.
As you can see from the result, people who upload are better retained: 47.13% of users who did our upload event came back a week after sign-up compared to 7.11% of those who did not. This suggests that we could pay more attention to our onboarding UX to make sure users are encouraged to upload a picture early in the process.
If we want to analyze our data without cohorts, we can do that, too! Click the x next to the group by clause to see the average retention curve across all users within the range selected.
Getting Value From Retention Analysis
The ability to derive insights from retention analysis goes beyond knowing how to generate and read a report — it depends on knowing what questions to ask. We recognize determining what metrics to analyze is difficult, so we’ve created a short list of tips below. Please don’t hesitate to reach out to email@example.com if you would like some guidance on how to analyze your data.
Tip 1: Retention analysis is particularly useful for making sure that changes to your application actually drive engagement. If you’ve made product improvements over time, you can use retention analysis to see if these changes have made an impact. You will most likely want to make sure that your newest users are more engaged, as this signals that product iterations are driving retention.
Tip 2: Retention analysis enables analyzing engagement even when masked by growth metrics. 52 Weeks of UX has a great write-up on this topic.
Tip 3: It is always best to define the start event as a one-time event to render clear results when grouping by the Date of Start Event. If you define the start event as a repeated event, such as Session, the first row of your retention table will likely be inflated. This is because the number of users in a cohort is defined by the first time they completed an action within the time range defined (which most likely will not be all-time). Thus, activity in the first row of your table will be artificially high from the activity of your power users.
Tip 4: Although we use retention and engagement synonymously in our documentation, it is always good to keep in mind the difference between the two.
Frequently Asked Questions
Why is the first row in my retention report always so high?
This is a result of the fact that each cohort is mutually exclusive. For example, if a user is already counted in row 0, then they will not show up in row 1, even if they were active during that time frame. No user is double-counted.
Above the first row in your chart, there is an implicit, unseen number of rows that we don’t show that would have further segmented your user activity. Since your retention report is set up to only count user activity within a specific time frame, anyone who would have been counted in one of those rows is automatically bucketed with the first row.
Why does the count of users whose first event was in the time range decrease in each row?
Depending on what you’re analyzing, as you look at the later cells to the right, the number of users who have completed the first time event may drop. This is because the number of users counted in each cell varies based on their event start date. For example, we may be analyzing the use of a specific tool over the past year, where each cell represents a month in that year. Users who haven’t been using that tool for over a year won’t be included in these later cells, because that time hasn’t passed for those users yet. So if a user started using the tool 6 months into that year, then they wouldn’t be counted in the cell that represents users who have used the tool for 7-12 months.
To make this a little more clear, try grouping these users by the date they completed the start event using the group by modifier Date of Start Event.
The resulting report will update with the time window of the start event listed next to each row. You’ll notice that the amount of cells listed for each row decreases by one in each row until the last row only has one ‘0’ cell.
Each cell in that row represents the number of users who have had activity for the length of time that the column header designates. To return to our previous example of analyzing use of a tool over the past year, the first row represents users who have used the tool for 12 months; the second represents users who have used the tool for 11 months; the third row represents users who have used the tool for 10 months; and so on. So for the third row, which represents users who have ten months of activity, there won’t be a cell representing users with 11 or 12 months of activity, because not enough time has passed for those users for anyone to be counted in that cell.