This feature is only available to customers on the Premier plan or the Pro plan as an add-on. To upgrade, contact your Customer Success Manager or sales@heap.io
This article lists all of the common queries in our Heap Connect Guide, which you can copy and paste into your BI Tool or run directly on your cluster. However, some queries require you to update the property name, event names, or date ranges to fit your needs.
These queries are also available in Mode. The visualizations may take a few seconds to load.
Need help? Please post in Community or contact us via the Get support page in Heap.
Basic Queries
Event Counts
SELECT device_type
,plan
,COUNT(session_id)
FROM main_production.sessions AS sessions
INNER JOIN main_production.users AS users ON sessions.user_id = users.user_id
GROUP BY 1, 2
For steps to set up a chart of this in Heap, see Setting Up a Usage Over Time Chart.
Average Property Value
SELECT [Property_to_Group_By], AVG([Numeric_Property])
FROM main_production.[event_name]
GROUP BY 1
ORDER BY 1
Aggregate Metrics
In this section, you’ll find a variety of Heap Connect queries to measure aggregate metrics for your site, including average time, top events, and more. The value of these queries increases as you segment them by user personas, and the same logic can also be applied to calculate more specific statistics.
Top 10 Events
The top 10 events shed light on how users are interacting with your app. Looking at this list generates a lot of questions: Why are users clicking this button? Why are users viewing this page? Are these two actions correlated? Does the high event count come from users’ success within the product, or is this caused by confusion?
Also available in Mode: Top 10 Events
You can modify the WHERE
statement to adjust the date range. You can include the date in SELECT
statement and GROUP BY
clause to measure how the most frequently performed actions change over time. You can also adjust the number of events you are looking at by modifying the LIMIT
.
SELECT
all_events.event_table_name AS "Event",
COUNT(*) AS "Event Count"
FROM main_production.all_events AS all_events
WHERE time >= DATEADD(day,-30, getdate())
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10
To see your top events in Heap, check out the raw events page.
Average Time on Site
This query is primarily beneficial to you if you have a content site, and there aren’t a lot of user interactions to capture while reading, and avid readers don’t always comment or share. Although this is a proxy for engagement, drastic changes in time on site can cue you in to investigate how users are engaging with your product.
Also available in Mode: Average Time on Site
You can tweak this query to analyze average time between events, or break it down into different segments of users to get a better understanding of your site’s usage.
SELECT ROUND(AVG(duration)::FLOAT, 4) "Average Session time in Minutes"
FROM (
SELECT session_id || '-' || user_id AS "unique"
,MIN(time) AS "start"
,MAX(time) AS "last"
,((DATEDIFF('milliseconds', MIN(time), MAX(time))::FLOAT / 1000 ) / 60) AS "duration"
FROM main_production.all_events
-- modify your date range here
WHERE time >= DATEADD('month', - 2, getDate())
GROUP BY 1
)
Average Time on Page
WITH ordered_pageviews AS
(SELECT *,
row_number() OVER (PARTITION BY user_id, session_id ORDER BY TIME ASC) AS rn
FROM main_production.pageviews
WHERE TIME >= DATEADD('month', - 1, getDate()))
SELECT a.path,
avg(pageview_time) / 60 as avg,
round(percentile_cont(0.25) within group (order by pageview_time) / 60, 2) as p25,
round(percentile_cont(0.5) within group (order by pageview_time) / 60, 2) as median,
round(percentile_cont(0.75) within group (order by pageview_time) / 60, 2) as p75,
round(percentile_cont(0.9) within group (order by pageview_time) / 60, 2) as p90
FROM
(SELECT p1.session_id,
p1.user_id,
p1.path,
datediff('seconds', p1.time, p2.time) AS pageview_time
FROM ordered_pageviews p1
JOIN ordered_pageviews p2 ON p1.session_id = p2.session_id
AND p1.user_id = p2.user_id
AND p1.rn + 1 = p2.rn
-- optional filter to limit paths you're analyzing
-- WHERE p1.path !~ '/app'
GROUP BY p1.session_id,
p1.user_id,
p1.path,
p1.time,
p2.time) a
GROUP BY 1
ORDER BY 2 DESC;
For steps to set up a chart of this in Heap, see How do I see time spent on page?
Average Events per Session
The average events per session metric can give you insight into aggregate behavior.
This query can also be modified to analyze behavior across segments and over time, allowing you to answer questions such as: Does this number change with a product launch or tweaks to the onboarding flow? Do different types of users interact differently? Changes in the average can prompt investigation into what is really happening within your app.
Also available in Mode: Average Events per Session
SELECT Round(Count(DISTINCT event_id) :: DECIMAL / Count(DISTINCT session_id), 2
) AS
"average"
FROM main_production.all_events
-- modify your date range here
WHERE TIME >= Dateadd('day', -30, Getdate())
Average Events per Session Over Time
To modify this query to calculate the average number of events per session over the past year or past six months, update the WHERE
statement to include the time range and add a GROUP BY
clause to address the granularity.
Also available in Mode: Average Events Per Session Over Time
SELECT TO_CHAR(main_production.all_events.time, 'YYYY-MM') as month, ROUND(COUNT(DISTINCT event_id)::DECIMAL / COUNT(DISTINCT session_id), 2) AS "average"
FROM main_production.all_events
-- modify your date range here
WHERE time >= DATEADD('month', -6, getdate())
GROUP BY 1
Average Events per Session by User Property
To calculate this, join the all events table on the user table and add a GROUP BY
clause based on the user properties you want to use for segmentation.
Also available in Mode: Average Events Per Session by Plan
SELECT user_property, ROUND(COUNT(DISTINCT event_id)::DECIMAL / COUNT(DISTINCT session_id), 2) AS "average"
FROM main_production.all_events JOIN main_production.users
ON main_production.all_events.user_id = main_production.users.user_id
-- modify your date range here
WHERE time >= DATEADD('day', -30, getdate())
Average Sessions per User
The average number of sessions per user can be used in the same way as average events per session and can be modified in the same manner. To find the average number of sessions per user, divide the number of unique sessions by the number of unique users across the time period.
Also available in Mode: Average Sessions Per User
SELECT ROUND(COUNT(DISTINCT session_id)::DECIMAL / COUNT(DISTINCT user_id), 2)
FROM main_production.sessions
-- modify the date range here
WHERE time >= DATEADD('day', - 30, GETDATE())
For steps to set up a chart of this in Heap, see See the average event count per user over time and select Session as your event.
Average Sessions Before Signup
(or any other event)
WITH first_signup AS
(SELECT user_id, min(time) AS first_time
FROM main_production.sign_up
GROUP BY 1),
sessions_pre_signup AS
( SELECT sessions.user_id, count(distinct session_id) AS total_sessions
FROM main_production.sessions
JOIN first_signup ON first_signup.user_id = sessions.user_id AND sessions.time <= first_signup.first_time
GROUP BY 1)
SELECT AVG(total_sessions)
FROM sessions_pre_signup
For steps to set up a chart of this in Heap, see See the average event count per user over time, select Session as your event, and filter for users who haven’t signed up yet (you may need to create a new segment for this group of users).
Average Property Value
SELECT [property_to_group_by],
Avg([numeric_property])
FROM main_production.[event_name]
GROUP BY 1
ORDER BY 1
Bounce Rate
WITH data as (
SELECT time::date as day, session_id, count(*) as pageviews
FROM heap_production.pageviews
GROUP BY 1,2
)
SELECT day, count(case when pageviews = 1 then 1 else null end) / count(*) as bounce_rate
from data
GROUP BY 1
ORDER BY 1
For steps to set up a bounce rate chart in Heap, see Acquisition analysis overview.
Funnels
This section will go through Heap Connect queries that enable you to conduct funnel analysis by measuring all types of funnels. Funnels are fundamental to understanding any flow through your site, from onboarding to filling out forms, to virtually any step users are expected to take within your product.
In this section, we will cover:
- Building a basic funnel
- Modifying the date range
- Limiting the funnel to one session
- Increasing flexibility (steps can occur in any order)
- Conversion rate over time (week over week)
For steps to do funnel analysis in Heap, see Funnel analysis overview.
The Basic Funnel
The funnel below can be used to mirror the funnel in Heap’s dashboard. It calculates the number of users who have completed Step 1 within the time frame, the number of users who have completed Step 2, and then Step 3. It’s both sequential and time-constrained, though there are many modifications you may wish to make, as shown below.
Also available in Mode: Onboarding Flow – Funnel Report
WITH e1
AS (
SELECT DISTINCT user_id
,1 AS step_1
,MIN(time) AS step_1_time
-- change to table name of event 1 in your funnel
FROM main_production.step_1 AS e1
WHERE time BETWEEN '2016-04-13'
AND '2016-04-21'
GROUP BY 1
)
,e2
AS (
SELECT e1.user_id
,1 AS step_2
,MIN(time) AS step_2_time
-- change to table name of event 2 in your funnel
FROM main_production.step_2 AS e2
INNER JOIN e1 ON e1.user_id = e2.user_id
WHERE time BETWEEN step_1_time
AND '2016-04-21'
GROUP BY 1
)
,e3
AS (
SELECT e3.user_id
,1 AS step_3
,MIN(time) AS step_3_time
-- change to table name of event 3 in your funnel
FROM main_production.step_3 AS e3
INNER JOIN e2 ON e2.user_id = e3.user_id
WHERE TIME BETWEEN step_2_time
AND '2016-04-21'
GROUP BY 1
)
SELECT SUM(step_1) AS "Step 1"
,SUM(step_2) AS "Step 2"
,SUM(step_3) AS "Step 3"
,ROUND(SUM(step_2)/SUM(step_1)::DECIMAL, 2) as step_1to2_conversion
,ROUND(SUM(step_3)/SUM(step_2)::DECIMAL,2) as step_2to3_conversion
,ROUND(SUM(step_3)/SUM(step_1)::DECIMAL, 2) as overall_conversion
FROM (
SELECT e1.user_id
,step_1
,step_1_time
,step_2
,step_2_time
,step_3
,step_3_time
FROM e1
LEFT JOIN e2 ON e1.user_id = e2.user_id
LEFT JOIN e3 ON e2.user_id = e3.user_id
)
Breaking it Down
The goal of this query is to get both the counts and the percentage that make it from step to step.
E1:
This table is selecting all the users and the first time they completed event 1 during the given time range.
E2:
This selects all the users who completed Step 1 and the first time they did Step 2 between the time they did Step 1 and the end date.
E3:
This selects all the users who completed Step 1 one and Step 2 and the first time they did Step 3 between the time they did Step 2 and the end date.
The Counts and Percentages:
This query first creates a table with the user ID, the time of each step was completed, and 1 or 0 based on whether or not the user completed each step. It calculates the total number of users who completed each step (each step value). Then the query calculates the conversion rate between Step 1 and Step 2, Step 2 and Step 3, and the overall conversion rate.
Putting it all together, and you a table with the following results:
| Step 1 Total | Step 2 Total | Step 3 Total | Step 1 → Step 2 % | Step 2 → Step 3 % | Step 1→Step 3 % |
To take advantage of Mode’s custom visualizations, we tweaked our query to return:
| Step 1 | Count |
| Step 2 | Count |
| Step 3 | Count |
Funnels Grouped by an Event-Level Property
Each event has a set of properties ranging from attributes Heap captures automatically, like referrer, UTM parameters, and device type, as well as any properties you set using snapshots, heap.addEventProperties
, or heap.track
.
To view a breakdown of the conversion rate based on these event-level properties, only a couple of modifications need to be made.
First, select the event that contains the event-level property you would like to analyze. In this example, this event-level property is a property of event 1. This query selects the utm_source (the event-level property) for the first event so that the event property can be included in the GROUP BY
clause of the funnel.
In the funnel, include the event-level property in the SELECT
statement and GROUP BY
clause. You can also join this on the users table (JOIN main_production.users on main_production.users.user_id=e1.user_id
) to analyze your conversion rates based on user-level properties as well.
This funnel is valuable for many reasons. For example, eCommerce sites can send the number of items in the cart as an event-level property for the beginning of the checkout process. Using this query, you can analyze the correlation between conversion rates and number of items in the cart.
Also available in Mode: Onboarding Funnel Group by Event Level Property
For steps to do funnel analysis in Heap, see Funnel analysis overview and apply a group by with any event-level property.
WITH e1
AS (
SELECT DISTINCT e1.user_id
,1 AS step_1
,min(TIME) AS step_1_time
,utm_source
FROM main_production.define_event AS e1
INNER JOIN (
SELECT user_id
,min(TIME) AS mintime
FROM main_production.define_event
WHERE DATE (TIME) BETWEEN '2016-04-13'
AND '2016-04-20'
GROUP BY 1
) AS min_value ON e1.user_id = min_value.user_id
AND e1.TIME = min_value.mintime
WHERE DATE (TIME) BETWEEN '2016-04-13'
AND '2016-04-20'
GROUP BY 1
,4
)
,e2 AS (
SELECT e1.user_id
,1 AS step_2
,MIN(time) AS step_2_time
-- change to table name of event 2 in your funnel
FROM main_production.step_2 AS e2
INNER JOIN e1 ON e1.user_id = e2.user_id
WHERE time BETWEEN step_1_time
AND '2016-04-21'
GROUP BY 1
)
,e3
AS (
SELECT e3.user_id
,1 AS step_3
,MIN(time) AS step_3_time
-- change to table name of event 3 in your funnel
FROM main_production.step_3 AS e3
INNER JOIN e2 ON e2.user_id = e3.user_id
WHERE TIME BETWEEN step_2_time
AND '2016-04-21'
GROUP BY 1
)
SELECT utm_source
,sum(step_1) AS "Step 1"
,sum(step_2) AS "Step 2"
,sum(step_3) AS "Step 3"
FROM (
SELECT e1.user_id
,step_1
,step_1_time
,step_2
,step_2_time
,step_3
,step_3_time
,utm_source
FROM e1
LEFT JOIN e2 ON e1.user_id = e2.user_id
LEFT JOIN e3 ON e2.user_id = e3.user_id
)
GROUP BY 1
Restricting Funnels to One Session
In this funnel, user ID and session ID are aligned to make sure that funnel spans across one session. There are a couple of tweaks needed in each step of the query.
E1:
First we need to gather the user ID, session ID, and event time for step one data. Make sure you are grouping by both user ID and session ID so that all of the users’ sessions that occur in the time frame are included in the funnel.
E2 and E3:
Next, you want to modify the second part of this query to join step 2 and step 3 data based on both user ID and session ID. Adding an and statement within the join allows us to join this data successfully. Make sure you continue to include session ID in your SELECT
and GROUP BY
clauses.
Conversion Rate:
To calculate the conversion rate, we first join these three event tables and calculate the number of users who have completed steps one, two, and three in each month. Then we are going to calculate the conversion rate for each month and group by month.
Also available in Mode: Onboarding Funnel limited to one session
For steps to do funnel analysis in Heap, see Funnel analysis overview and set the date range granularity to a single session.
WITH e1 AS
(
SELECT DISTINCT user_id,
session_id ,
Min(time) AS step_1_time
-- change to table name of event 1 in your funnel
FROM main_production.step_1 AS e1
WHERE time BETWEEN '2016-04-13' AND '2016-04-21'
GROUP BY 1,
2 ), e2 AS
(
SELECT e1.user_id,
e1.session_id ,
Min(time) AS step_2_time
-- change to table name of event 2 in your funnel
FROM main_production.step_2 AS e2
INNER JOIN e1
ON e1.user_id = e2.user_id
AND e1.session_id = e2.session_id
WHERE time BETWEEN step_1_time AND '2016-04-21'
GROUP BY 1,
2 ), e3 AS
(
SELECT e3.user_id,
e3.session_id ,
Min(time) AS step_3_time
-- change to table name of event 3 in your funnel
FROM main_production.step_3 AS e3
INNER JOIN e2
ON e2.user_id = e3.user_id
AND e2.session_id = e3.session_id
WHERE time BETWEEN step_2_time AND '2016-04-21'
GROUP BY 1,
2 )
SELECT count(DISTINCT step_1 as step_one ,count(DISTINCT step_2) AS step_two ,count(DISTINCT step_3) AS step_three ,round(count(DISTINCT step_3)/count(DISTINCT step_1)::decimal, 2) AS conversion_rate
-- calculate additional percentages here
FROM (
SELECT e1.user_id AS step_1 ,
e1.session_id ,
step_1_time ,
e2.user_id AS step_2 ,
step_2_time ,
e3.user_id AS step_3 ,
step_3_time
FROM e1
LEFT JOIN e2
ON e1.user_id = e2.user_id
AND e1.session_id = e2.session_id
LEFT JOIN e3
ON e2.user_id = e3.user_id
AND e2.session_id = e3.session_id
ORDER BY 1 ) 1
Conversion Rate Over Time
This report gives the user a window to complete the funnel and tracks how your conversion changes over time.
Now available in Mode: Rolling Conversion Rate
For steps to see conversion rate over time in Heap, see Calculate conversion rate between two events.
WITH e1 AS(
SELECT DISTINCT user_id
,1 AS step_1
,MIN(time) AS step_1_time
,TO_CHAR(e1.time, 'YYYY-MM') as "month"
FROM main_production.[event_1] AS e1
WHERE TIME BETWEEN '2015-04-01' --start time
AND '2016-04-01' --end time
GROUP BY 1,4
),
e2 as (
SELECT e1.user_id, 1 AS step_2, MIN(e2.time) AS step_2_time, e1.step_1_time
FROM main_production.[event_2] AS e2
JOIN e1 ON e1.user_id=e2.user_id AND e1.step_1_time >= DATEADD('day', -30, e2.time)
WHERE time BETWEEN step_1_time AND '2016-04-01'
GROUP BY 1, 4
),
e3 as (
SELECT e3.user_id, 1 AS step_3, MIN(time) AS step_3_time, e2.step_1_time
FROM main_production.[event-3] AS e3
JOIN e2 ON e2.user_id=e3.user_id AND e2.step_1_time >= DATEADD('day', -30, e3.time)
WHERE time BETWEEN step_2_time AND '2016-04-01'
GROUP BY 1, 4
)
SELECT MONTH, ROUND(SUM(step_3)/SUM(step_1)::DECIMAL, 2) AS "Conversion Rate"
FROM(
SELECT e1.user_id, step_1, e1.step_1_time, step_3, step_3_time, e1.month
FROM e1 LEFT JOIN e2 ON e1.user_id = e2.user_id AND e1.step_1_time=e2.step_1_time
LEFT JOIN e3 ON e2.user_id= e3.user_id AND e2.step_1_time=e3.step_1_time
)
GROUP BY 1
ORDER BY 1
Time to Conversion / Average Time to Event
Time can be an interesting metric to analyze. The time it takes for a user to make their first purchase from the date first seen, or the time it takes a user to fill out a form from the first field to submit, or the time between two purchases can be measured and optimized. You can analyze which behaviors, user properties, or first touch properties are associated with faster conversion rates.
This example measures the time it takes users to purchase an item but can be easily adapted to measure a slew of different events, and segmented by a variety of user and event-level properties.
Also available in Mode: Average Time to Event
For steps to set up a chart of this in Heap, see See how long it takes users to convert.
SELECT avg(duration) AS "Average hours until Purchase"
FROM (
SELECT user_id
,datediff('milliseconds', start_time, end_time)::DECIMAL / 1000 / 60 / 60 / AS "duration"
FROM (
SELECT main_production.users.user_id
,min(joindate) AS start_time
,min(main_production.purchase.time) AS end_time
FROM main_production.sign_up
JOIN main_production.users ON main_production.purchase.user_id = main_production.users.user_id
GROUP BY 1
)
)
Path Analysis
This query allows you to track the most common flows or ‘paths’ into and out of your product via Heap Connect.
As is, the SQL selects the first 5 events of every session consisting of at least five events. It then aggregates the number of users who have completed each flow and displays the top 20 most common paths. For example, looking at the table below, we see the most common user flow ishome-view_page
-> signup_view_page
-> signup_enter_name
-> signup_enter_email
-> signup_enter_password
and that this particular flow has occured 2791 times.
We recommend these instructions in cases where you want to filter for users who meet certain criteria based on data that only exists in your warehouse. For all other paths analysis cases, we recommend using the Journeys chart type in Heap. See Journeys overview to get started.
Heap is unique in the flexibility offered when defining events. A given event_id
across all Redshift or BigQuery tables corresponds to a unique event recorded by Heap. That said, the same event_id
may exist in multiple event tables. This is because Heap provides the flexibility to create multiple event definitions that correspond to the same raw event. For instance, you may define the following two events in the product:
Click CTA
defined asClick on .cta
=Click CTA - Homepage
defined asClick on .cta
with afilter where Path equals /
If a user clicks the CTA on the homepage, a new event will be recorded on both event tables because it corresponds with both event definitions. As a result, two events with the same event_id will be included in the all_events table, since it contains every recorded instance of all defined and custom events.
In order to select the events most valuable to you, when multiple events are firing at the same time, this query selects the events that are the most specific. This is based on the heuristic that events with more specificity are fired less overall then events with more general definitions. If you look at the example mentioned above, although two events are firing when a user clicks .cta
on the homepage, this query would only include the Click CTA - Homepage
event, rather than the global Click CTA
event, allowing you to have more insight into your users’ paths.
Also available in Mode: Path Analysis
In Mode’s visualization, each block of color is representative of a particular event; it’s size is relative to the percentage of users who have completed that event. By hovering over a particular chain of events starting at the center of the circle, you can trace the relative and absolute percentage of users who have followed the path within your app.
WITH event_count
AS (
SELECT event_table_name
,count(*) AS cardinality
FROM main_production.all_events
WHERE TIME > dateadd('day', - 30, getDate())
GROUP BY 1
)
,all_events
AS (
SELECT DISTINCT event_id
,main_production.all_events.user_id AS user_id
,main_production.all_events.event_table_name AS event_name
,main_production.all_events.TIME AS occurred_at
,cardinality AS number
FROM main_production.all_events
LEFT JOIN event_count ON main_production.all_events.event_table_name =event_count.event_table_name
ORDER BY 1
,2
)
,events
AS (
SELECT all_events.user_id
,all_events.event_id
,event_name
,occurred_at
,cardinality
FROM all_events
INNER JOIN (
SELECT event_id
,user_id
,min(number) AS cardinality
FROM all_events
GROUP BY 1
,2
) AS event ON all_events.number = event.cardinality
AND all_events.event_id = event.event_id
AND all_events.user_id = event.user_id
ORDER BY 1
,2
,4
)
SELECT e1
,e2
,e3
,e4
,e5
,COUNT(*) AS occurrances
FROM (
--Pivot out first five events in each session
SELECT user_id
,session
,MAX(CASE
WHEN event_number = 1
THEN event_name
ELSE NULL
END) AS e1
,MAX(CASE
WHEN event_number = 2
THEN event_name
ELSE NULL
END) AS e2
,MAX(CASE
WHEN event_number = 3
THEN event_name
ELSE NULL
END) AS e3
,MAX(CASE
WHEN event_number = 4
THEN event_name
ELSE NULL
END) AS e4
,MAX(CASE
WHEN event_number = 5
THEN event_name
ELSE NULL
END) AS e5
FROM (
-- Find event number in session
SELECT z.*
,ROW_NUMBER() OVER (
PARTITION BY user_id
,session ORDER BY occurred_at
) AS event_number
FROM (
-- Sum breaks to find sessions
SELECT y.*
,SUM(BREAK) OVER (
ORDER BY user_id
,occurred_at ROWS UNBOUNDED PRECEDING
) AS session
FROM (
-- Add flag if last event was more than 10 minutes ago
SELECT x.*
,CASE
WHEN last_event IS NULL
OR occurred_at >= last_event + INTERVAL '10 MINUTE'
THEN 1
ELSE 0
END AS BREAK
FROM (
-- Find last event
SELECT *
,LAG(occurred_at, 1) OVER (
PARTITION BY user_id ORDER BY occurred_at
) AS last_event
FROM events
) x
) y
) z
) a
WHERE event_number <= 5
GROUP BY 1
,2
) final
GROUP BY 1
,2
,3
,4
,5
ORDER BY 6 DESC LIMIT 20
Attribution
This section will go through Heap Connect queries that enable you to measure attribution to better understand where your users are coming from.
The influence chart in Heap allows you to analyze what behaviors or marketing channels have influenced behavior of interest. For instructions to set up an influence chart, see Influence analysis overview. For more info about how attribution is measured in Heap, see Understand attribution.
Top Referrers
This queries the top N referrers and the number of sessions they generate.
First Touch Properties
This query generates a table that matches Heap’s Users view. You can export and select all the user-level properties and all the first touch properties.
How it works
This query starts off by calculating the first session for each user, and joins the first sessions data with the users table based on the unique user ID. From here, each user-level property and session-level property can be selected.
Applications
In addition to generating a list of all users, this query can be modified to contain the user-level properties of a subset of users. You can generate a list of users who were first seen during a specific period of time by adding in a WHERE
clause with a date range. This table can be joined to other reports to track a particular cohorts experience through your product. You can generate a list of email addresses for a particular email blast targeted towards users who joined based on a certain UTM campaign.
Also available in Mode: First Touch Properties
SELECT all_sessions.user_id
,first_session.first_time AS first_seen
,referrer
,utm_source
,utm_campaign
,utm_medium
,utm_content
FROM main_production.sessions AS all_sessions
INNER JOIN (
SELECT user_id
,MIN(time) AS first_time
FROM main_production.sessions
GROUP BY user_id
) AS first_session ON all_sessions.user_id = first_session.user_id
AND all_sessions.time = first_session.first_time
INNER JOIN main_production.users AS users ON all_sessions.user_id = users.user_id
--[optional] insert where clause here
ORDER BY 3 DESC
First Touch Property by User Count
This query returns the number of users that fall into each category. In this case, the number of users who have come in via each UTM source will be displayed. This can be adapted and broken down by any first touch or user-level properties. For example, this query can be modified to return the number of users who have the initial referrer Facebook and the initial device type as mobile, which in turn can help you make decisions about successful ad campaigns.
Also available in Mode: Users per UTM Source
SELECT utm_source, COUNT(distinct user_id)
FROM main_production.sessions AS all_sessions
INNER JOIN (
SELECT user_id
,MIN(time) AS first_time
FROM main_production.sessions
GROUP BY user_id
) AS first_session ON all_sessions.user_id = first_session.user_id
AND all_sessions.time = first_session.first_time
INNER JOIN main_production.users AS users ON all_sessions.user_id = users.user_id
GROUP BY 1
Multi-Touch Attribution
Users come in via many channels, and the campaign that drives the first touch doesn’t always match with the campaign on the conversion event. Because Heap automatically captures the referrer and UTM parameters for each session, you can build a query that compares the channel for each session a user has. This query compares the UTM source of the first touch with the UTM source on the session with the conversion event, and calculates the number of users who fall into each category.
How it works
First, we calculate the UTM source of the first touch by finding the session with the earliest start time, then join the first session with the conversion event based on the user_id
. Finally, we select the initial UTM campaign, and the conversion event’s UTM campaign and the breakdown of users. Make sure to include a GROUP BY
statement to categorize your users.
Also available in Mode: Multitouch Attribution
SELECT initial.utm_campaign AS "INITIAL"
,CASE
WHEN event_2.utm_campaign IS NULL
THEN 'None'
ELSE event_2.utm_campaign
END AS "LAST TOUCH"
,count(DISTINCT initial.user_id) AS "Number of Users"
FROM (
SELECT all_sessions.user_id
,CASE
WHEN all_sessions.utm_campaign IS NULL
THEN 'None'
ELSE all_sessions.utm_campaign
END
FROM main_production.sessions AS all_sessions
INNER JOIN (
SELECT user_id
,MIN(time) AS first_time
FROM main_production.sessions
GROUP BY user_id
) AS first_session ON all_sessions.user_id = first_session.user_id
AND all_sessions.time = first_session.first_time
) AS initial
JOIN main_production.conversion_event AS event_2
ON initial.user_id = event_2.user_id
GROUP BY 1
,2
ORDER BY 3 DESC
Retention
This Heap Connect query calculates session to session retention analysis on a monthly basis. Each user is segmented into a cohort based on the month of their join date, and is counted in the total retained on a monthly basis if they have a session during each month.
For steps to conduct retention analysis in Heap, see Retention analysis overview.
Also available in Mode: Retention Cohorted by Joindate
In this chart, each user is assigned to a cohort based on their join date. Each cohort is represented by a different color (e.g. the users who joined in July 2015 are in the 2015-07 or orange cohort). It then traces the percentage of users who have at least one additional session in the following months. If we continue to look at the orange cohort
, this chart shows us that .16 or 16% of users came back in July and had an additional session, .06 or 6% of users came back in August for an additional session, .04 or 4% came back in September, and so on.
You can compare different cohorts by adjusting the query to contain the distance from the join date rather than the month.
Also available in Mode: Comparing Retention for Cohorts by Joindate
This retention report cohorts users in the same way as the previous retention report; the orange cohort
is still mapped to users who first joined in July. However, rather than mapping the percentage of users retained to a particular month, this query uses relative time to track retention across periods (same month as join date, one month later, two months later, etc.). It then stacks each cohort to allow you to analyze how the join date is correlated with retention. If you look at Month 0, you’ll see that users who joined in July 2015 have a retention rate of 16% during that first month, whereas users who joined in February 2016 have a 12% retention rate.
There is also a slight caveat in this view. You will notice that several cohorts have 0% retained users. In some cases, this is because there is no data five months after May 2016.
Month-to-Month Retention
--get a cohort based on join date
WITH new_user_activity
AS (
SELECT activity.user_id
,activity.TIME AS DATE
FROM main_production.sessions AS activity
INNER JOIN users ON users.user_id = activity.user_id
AND users.joindate = activity.TIME
WHERE date_part('year', joindate) > 2014
)
--number of users in cohort
,cohort_active_user_count
AS (
SELECT to_char(DATE, 'YYYY-MM') AS DATE
,count(DISTINCT user_id) AS count
FROM new_user_activity
GROUP BY 1
)
--change format to change granularity
SELECT joindate
,period
,new_users
,retained_users
,retention
FROM (
SELECT to_char(new_user_activity.DATE, 'YYYY-MM') AS "joindate"
,'Month ' ||
--change month to change granularity
datediff('month', new_user_activity.DATE, future_activity.TIME) AS period
,max(cohort_size.count) AS new_users
,-- all equal in group
count(DISTINCT future_activity.user_id) AS retained_users
,round(count(DISTINCT future_activity.user_id) / max(cohort_size.count)::DECIMAL, 2) AS retention
FROM new_user_activity
LEFT JOIN main_production.sessions AS future_activity ON new_user_activity.user_id = future_activity.user_id
AND new_user_activity.DATE < future_activity.TIME
AND date_add('month', 9, new_user_activity.DATE) >= future_activity.TIME
LEFT JOIN cohort_active_user_count AS cohort_size ON to_char(new_user_activity.DATE, 'YYYY-MM') = cohort_size.DATE
GROUP BY 1
,2
) t
WHERE period IS NOT NULL
ORDER BY 1
,2
Joining Heap with Revenue Data
One of the most powerful aspects of Heap Connect is the ability to join a complete set of user interactions with other data sources to see revenue data, such as a user’s order history. This section will cover topics such as calculating the average revenue per user and the lifetime value of a user based on their interactions in your product and attribution channels. You can check out a complete list of all queries in Common Queries.
ARPU per UTM Source
This query combines Heap’s attribution data with order history data, allowing you to compare how each source is correlated with a user’s downstream behavior.
Also available in Mode: ARPU by UTM Source
WITH first_touch
AS (SELECT all_sessions.user_id,
identity,
first_session.first_time AS first_seen,
utm_source
--add any user level properties or session level properties here
FROM main_production.sessions AS all_sessions
inner join (SELECT user_id,
Min(TIME) AS first_time
FROM main_production.sessions
GROUP BY user_id) AS first_session
ON all_sessions.user_id = first_session.user_id
AND all_sessions.TIME = first_session.first_time
inner join main_production.users AS users
ON all_sessions.user_id = users.user_id)
SELECT CASE
WHEN utm_source IS NULL THEN 'none'
ELSE utm_source
END,
Round(SUM(order_total) :: DECIMAL, 2) AS total_revenue,
Round(SUM(order_total) :: DECIMAL / Count(DISTINCT customer_id) ::
DECIMAL, 2)
AS avg_per_customer
FROM heap.TRANSACTION
join first_touch
ON customer_id = identity
GROUP BY 1
Monthly ARPU per UTM Source
With a slight tweak to your query, you can see how the UTM source is correlated with the ARPU on a month-to-month basis.
Also available in Mode: Monthly ARPU per UTM
WITH first_touch
AS (SELECT all_sessions.user_id,
identity,
first_session.first_time AS first_seen,
utm_source
--add any user level properties or session level properties here
FROM main_production.sessions AS all_sessions
inner join (SELECT user_id,
Min(TIME) AS first_time
FROM main_production.sessions
GROUP BY user_id) AS first_session
ON all_sessions.user_id = first_session.user_id
AND all_sessions.TIME = first_session.first_time
inner join main_production.users AS users
ON all_sessions.user_id = users.user_id)
SELECT utm_source,
CASE
WHEN Length(Substring(order_date, 0, Position('/' IN order_date))) = 2
THEN
'20'
||
Right(order_date, 2)
||
'-'
||
Substring(order_date, 0, Position('/' IN order_date))
ELSE '20'
|| Right(order_date, 2)
||'-0'
|| Substring(order_date, 0, Position('/' IN order_date))
END AS order_month,
Count(DISTINCT customer_id),
Round(SUM(order_total) :: DECIMAL, 2) AS total_revenue,
Round(SUM(order_total) :: DECIMAL / Count(DISTINCT customer_id) ::
DECIMAL, 2)
AS avg_per_customer
FROM heap.TRANSACTION
join first_touch
ON customer_id = identity
WHERE utm_source IS NOT NULL
GROUP BY 1,
2
ORDER BY 2
ARPU by Behavior
You can also combine Heap data with revenue data to determine if certain actions or if the number of times an action is performed is correlated with higher returns. This report segments users into two categories: users who have uploaded files, and users who haven’t uploaded files. It then calculates the ARPU for each segment. We can see here that the ARPU is roughly seven times higher for users who have uploaded files.
Also available in Mode: ARPU by Behavior
WITH user_info
AS (SELECT identity,
users.user_id,
SUM(order_total) AS total_revenue,
Count(event_id) AS event_count
FROM main_production.users AS users
join heap.TRANSACTION
ON identity = customer_id
left join main_production.dashboard_galleries_click_add_new_files AS event
ON users.user_id = event.user_id
GROUP BY 1,
2)
SELECT CASE
WHEN event_count = 0 THEN 'has not uploaded files'
ELSE 'has uploaded files'
END,
Round(SUM(total_revenue) :: DECIMAL / Count(DISTINCT user_id), 2) AS
avg_revenue_per_user
FROM user_info
GROUP BY 1
Churned Customer Revenue
Heap Connect allows you to get a better understanding of your churned customers as well. You can answer questions like “Where did they come from?”, “What did they do in my product?”, and “How much revenue did a customer bring in before churning?”. The table below combines Heap session data, server-side events, and revenue data to enhance your understanding of churned customers.
Also available in Mode: Churned Customer Behavior
WITH first_touch
AS (SELECT all_sessions.user_id,
identity,
first_session.first_time AS first_seen,
utm_source
--add any user level properties or session level properties here
FROM main_production.sessions AS all_sessions
inner join (SELECT user_id,
Min(TIME) AS first_time
FROM main_production.sessions
GROUP BY user_id) AS first_session
ON all_sessions.user_id = first_session.user_id
AND all_sessions.TIME = first_session.first_time
inner join main_production.users AS users
ON all_sessions.user_id = users.user_id)
SELECT utm_source,
Round(SUM(order_total) :: DECIMAL, 2)
AS churned_customer_total_value,
Round(Count(DISTINCT order_id) :: DECIMAL / Count(DISTINCT user_id), 2)
AS
avg_orders_per_user,
Round(SUM(order_total) :: DECIMAL / Count(DISTINCT first_touch.identity),
2) AS
avg_ltv
FROM first_touch
join heap.TRANSACTION
ON customer_id = first_touch.identity
join main_production.account_modification
ON first_touch.identity = main_production.account_modification.identity
WHERE account_change = 'cancel'
AND utm_source IS NOT NULL
GROUP BY 1
Modeling Your Data
For info on how to dig into user and session info in Heap, see Users overview.
Creating An Enhanced Users Table
This query provides an example of how to recreate the users table, adding the Initial User Properties (ex: Initial Referrer, Initial Browser, etc) that are displayed in your Heap account.
BEGIN;
DROP TABLE IF EXISTS enhanced_users;
CREATE TABLE enhanced_users AS
SELECT u.*
,s.time AS date_first_seen
,s.referrer AS initial_referrer
,s.utm_source AS initial_utm_source
,s.utm_medium AS initial_utm_medium
,s.utm_campaign AS initial_utm_campaign
,s.city AS initial_city
,s.region AS initial_region
,s.country AS initial_country
,s.landing_page AS initial_landing_page
,s.browser AS initial_browser
,s.device_type AS initial_device_type
,s.platform AS initial_platform
,s.ip AS initial_ip
FROM heap_production.sessions s
JOIN heap_production.users u ON s.user_id = u.user_id
JOIN
(SELECT user_id, min(TIME) AS first_session_time
FROM heap_production.sessions
GROUP BY user_id) AS first_session
ON s.user_id = first_session.user_id
AND first_session_time = s.time;
END;
Sessionizing Client-Side and Server-Side Events
BEGIN;
-- Prepending a table with # makes a temp table. Change this if you want a table to persist.
DROP TABLE IF EXISTS #sessions;
CREATE TABLE #sessions AS
SELECT user_id
,min(time) over (partition by calc_session_id) as session_time
,time as event_time
,calc_session_id
,split_part(calc_session_id, '_', 2) as visit_number_in_range
,event_table_name
,first_value(path) over (partition by calc_session_id) as landing_page
,first_value(utm_source) over (partition by calc_session_id) as utm_source
,first_value(utm_medium) over (partition by calc_session_id) as utm_medium
,first_value(utm_campaign) over (partition by calc_session_id) as utm_campaign
,path
,total_price
,subtotal_price
FROM
(SELECT *
,user_id || '_' || sum(new_session) over (partition by user_id order by time rows unbounded preceding ) AS calc_session_id
FROM
(SELECT e.*
,p.path
,p.utm_source
,p.utm_medium
,p.utm_campaign
,o.total_price
,o.subtotal_price
-- Change your session definition here i.e. 30 * 60 = 30 minutes so 60 * 60 = 1 hour.
,case when extract(epoch from e.time) - lag(extract(epoch from e.time)) over (partition by e.user_id order by e.time) >= 30 * 60 then 1 else 0 end as new_session
FROM heap_production.all_events e
LEFT JOIN heap_production.pageviews p ON p.user_id = e.user_id
AND p.event_id = e.event_id
AND p.session_id = e.session_id
LEFT JOIN heap_production.confirmed_order o ON o.user_id = e.user_id
AND o.event_id = e.event_id
AND o.session_id = e.session_id
-- This example focuses on pageviews and a conversion event called `confirmed_order` from Heap's Shopify integration.
-- If you extend this model to include other tables and want properties from their tables, follow the left join pattern above and add the event_table_name to filter below.
WHERE (event_table_name = 'pageviews' or event_table_name = 'confirmed_order')
-- Change your date range here.
-- AND e.time > current_date - interval '1 week'
ORDER BY time) s)
ORDER BY time;
END;