Heap Connect: Common Queries

 

This article lists all of the common queries in our Heap Connect Guide, which you can copy and paste into your BI Tool or run directly on your cluster. However, some queries require you to update the property name, event names, or date ranges to fit your needs.

These queries are also available in Mode. The visualizations may take a few seconds to load.

 

Basic Queries

Event Counts

SELECT device_type
          ,plan
          ,COUNT(session_id)
  FROM main_production.sessions AS sessions
  INNER JOIN main_production.users AS users ON sessions.user_id = users.user_id
  GROUP BY 1, 2
For steps to set up a chart of this in Heap, see Setting Up a Usage Over Time Chart.

Average Property Value

SELECT  [Property_to_Group_By], AVG([Numeric_Property])
  FROM main_production.[event_name] 
  GROUP BY 1
  ORDER BY 1

Aggregate Metrics

In this section, you’ll find a variety of Heap Connect queries to measure aggregate metrics for your site, including average time, top events, and more. The value of these queries increases as you segment them by user personas, and the same logic can also be applied to calculate more specific statistics.

Top 10 Events

The top 10 events shed light on how users are interacting with your app. Looking at this list generates a lot of questions: Why are users clicking this button? Why are users viewing this page? Are these two actions correlated? Does the high event count come from users’ success within the product, or is this caused by confusion?

Also available in Mode: Top 10 Events

You can modify the WHERE statement to adjust the date range. You can include the date in SELECT statement and GROUP BY clause to measure how the most frequently performed actions change over time. You can also adjust the number of events you are looking at by modifying the LIMIT.

SELECT 
  all_events.event_table_name AS "Event",
        COUNT(*) AS "Event Count"
FROM main_production.all_events AS all_events
 WHERE time = DATEADD(day,-30, getdate())
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10
To see your top events in Heap, check out the raw events page.

Average Time on Site

This query is primarily beneficial to you if you have a content site, and there aren’t a lot of user interactions to capture while reading, and avid readers don’t always comment or share. Although this is a proxy for engagement, drastic changes in time on site can cue you in to investigate how users are engaging with your product.

Also available in Mode: Average Time on Site

You can tweak this query to analyze average time between events, or break it down into different segments of users to get a better understanding of your site’s usage.

SELECT ROUND(AVG(duration)::FLOAT, 4) "Average Session time in Minutes"
  FROM (
          SELECT session_id || '-' || user_id AS "unique"
                  ,MIN(time) AS "start"
                  ,MAX(time) AS "last"
                ,((DATEDIFF('milliseconds', MIN(time), MAX(time))::FLOAT / 1000  ) / 60) AS "duration"
          FROM main_production.all_events
          -- modify your date range here 
          WHERE time = DATEADD('month', - 2, getDate())
          GROUP BY 1
          )

Average Time on Page

WITH ordered_pageviews AS
  (SELECT *,
          row_number() OVER (PARTITION BY user_id, session_id ORDER BY TIME ASC) AS rn
   FROM main_production.pageviews
   WHERE TIME = DATEADD('month', - 1, getDate()))
SELECT a.path,
       avg(pageview_time) / 60 as avg,
       round(percentile_cont(0.25) within group (order by pageview_time) / 60, 2) as p25,
       round(percentile_cont(0.5) within group (order by pageview_time) / 60, 2) as median,
       round(percentile_cont(0.75) within group (order by pageview_time) / 60, 2) as p75,
       round(percentile_cont(0.9) within group (order by pageview_time) / 60, 2) as p90 
FROM
  (SELECT p1.session_id,
          p1.user_id,
          p1.path,
          datediff('seconds', p1.time, p2.time) AS pageview_time
   FROM ordered_pageviews p1
   JOIN ordered_pageviews p2 ON p1.session_id = p2.session_id
   AND p1.user_id = p2.user_id
   AND p1.rn + 1 = p2.rn
   -- optional filter to limit paths you're analyzing
   -- WHERE p1.path !~ '/app'
   GROUP BY p1.session_id,
            p1.user_id,
            p1.path,
            p1.time,
            p2.time) a
GROUP BY 1
ORDER BY 2 DESC;
For steps to set up a chart of this in Heap, see How do I see time spent on page?

Average Events per Session

The average events per session metric can give you insight into aggregate behavior.

This query can also be modified to analyze behavior across segments and over time, allowing you to answer questions such as: Does this number change with a product launch or tweaks to the onboarding flow? Do different types of users interact differently? Changes in the average can prompt investigation into what is really happening within your app.

Also available in Mode: Average Events per Session

SELECT Round(Count(DISTINCT event_id) :: DECIMAL / Count(DISTINCT session_id), 2 
       ) AS 
       "average" 
  FROM   main_production.all_events 
  -- modify your date range here 
  WHERE  TIME = Dateadd('day', -30, Getdate())

Average Events per Session Over Time

To modify this query to calculate the average number of events per session over the past year or past six months, update the WHERE statement to include the time range and add a GROUP BY clause to address the granularity.

Also available in Mode: Average Events Per Session Over Time

SELECT TO_CHAR(main_production.all_events.time, 'YYYY-MM') as month, ROUND(COUNT(DISTINCT event_id)::DECIMAL / COUNT(DISTINCT session_id), 2) AS "average"
  FROM main_production.all_events
  -- modify your date range here
  WHERE time = DATEADD('month', -6, getdate())
  GROUP BY 1

Average Events per Session by User Property

To calculate this, join the all events table on the user table and add a GROUP BY clause based on the user properties you want to use for segmentation.

Also available in Mode: Average Events Per Session by Plan

SELECT user_property, ROUND(COUNT(DISTINCT event_id)::DECIMAL / COUNT(DISTINCT session_id), 2) AS "average"
  FROM main_production.all_events JOIN main_production.users 
    ON main_production.all_events.user_id = main_production.users.user_id
  -- modify your date range here
  WHERE time = DATEADD('day', -30, getdate())

Average Sessions per User

The average number of sessions per user can be used in the same way as average events per session and can be modified in the same manner. To find the average number of sessions per user, divide the number of unique sessions by the number of unique users across the time period.

Also available in Mode: Average Sessions Per User

SELECT ROUND(COUNT(DISTINCT session_id)::DECIMAL / COUNT(DISTINCT user_id), 2)
  FROM main_production.sessions
  -- modify the date range here
  WHERE time = DATEADD('day', - 30, GETDATE())
For steps to set up a chart of this in Heap, see See the average event count per user over time and select Session as your event.

Average Sessions Before Signup

(or any other event)

WITH first_signup AS 
(SELECT user_id, min(time) AS first_time
FROM main_production.sign_up 
GROUP BY 1),
sessions_pre_signup AS 
( SELECT sessions.user_id, count(distinct session_id) AS total_sessions
FROM main_production.sessions 
JOIN first_signup ON first_signup.user_id = sessions.user_id AND sessions.time <= first_signup.first_time="first_signup.first_time" group="GROUP" by="BY" 1)="1)" select="SELECT" avg(total_sessions)="AVG(total_sessions)" from="FROM" sessions_pre_signup="sessions_pre_signup"></=>
For steps to set up a chart of this in Heap, see See the average event count per user over time, select Session as your event, and filter for users who haven't signed up yet (you may need to create a new segment for this group of users).

Average Property Value

SELECT [property_to_group_by], 
         Avg([numeric_property]) 
  FROM   main_production.[event_name] 
  GROUP  BY 1 
  ORDER  BY 1

Bounce Rate

WITH data as (
  SELECT time::date as day, session_id, count(*) as pageviews
  FROM heap_production.pageviews
  GROUP BY 1,2
)
SELECT day, count(case when pageviews = 1 then 1 else null end) / count(*) as bounce_rate
from data
GROUP BY 1
ORDER BY 1
For steps to set up a bounce rate chart in Heap, see Acquisition analysis overview.

Funnels

This section will go through Heap Connect queries that enable you to conduct funnel analysis by measuring all types of funnels. Funnels are fundamental to understanding any flow through your site, from onboarding to filling out forms, to virtually any step users are expected to take within your product.

In this section, we will cover:

  • Building a basic funnel
  • Modifying the date range
  • Limiting the funnel to one session
  • Increasing flexibility (steps can occur in any order)
  • Conversion rate over time (week over week)
For steps to do funnel analysis in Heap, see Funnel analysis overview.

The Basic Funnel

The funnel below can be used to mirror the funnel in Heap’s dashboard. It calculates the number of users who have completed Step 1 within the time frame, the number of users who have completed Step 2, and then Step 3. It's both sequential and time-constrained, though there are many modifications you may wish to make, as shown below.

Also available in Mode: Onboarding Flow - Funnel Report

WITH e1
  AS (
        SELECT DISTINCT user_id
                ,1 AS step_1
                ,MIN(time) AS step_1_time
        -- change to table name of event 1 in your funnel
        FROM main_production.step_1 AS e1
        WHERE time BETWEEN '2016-04-13'
                        AND '2016-04-21'
        GROUP BY 1
        )
        ,e2
  AS (
        SELECT e1.user_id
                ,1 AS step_2
                ,MIN(time) AS step_2_time
        -- change to table name of event 2 in your funnel
        FROM main_production.step_2 AS e2
        INNER JOIN e1 ON e1.user_id = e2.user_id
        WHERE time BETWEEN step_1_time
                        AND '2016-04-21'
        GROUP BY 1
        )
        ,e3
  AS (
        SELECT e3.user_id
                ,1 AS step_3
                ,MIN(time) AS step_3_time
        -- change to table name of event 3 in your funnel
        FROM main_production.step_3 AS e3
        INNER JOIN e2 ON e2.user_id = e3.user_id
        WHERE TIME BETWEEN step_2_time
                        AND '2016-04-21'
        GROUP BY 1
        )
        
  SELECT SUM(step_1) AS "Step 1"
        ,SUM(step_2) AS "Step 2"
        ,SUM(step_3) AS "Step 3"
        ,ROUND(SUM(step_2)/SUM(step_1)::DECIMAL, 2) as step_1to2_conversion
        ,ROUND(SUM(step_3)/SUM(step_2)::DECIMAL,2) as step_2to3_conversion
        ,ROUND(SUM(step_3)/SUM(step_1)::DECIMAL, 2) as overall_conversion
  FROM (
        SELECT e1.user_id
                ,step_1
                ,step_1_time
                ,step_2
                ,step_2_time
                ,step_3
                ,step_3_time
        FROM e1
        LEFT JOIN e2 ON e1.user_id = e2.user_id
        LEFT JOIN e3 ON e2.user_id = e3.user_id
        )

Breaking it Down

The goal of this query is to get both the counts and the percentage that make it from step to step.

E1:
This table is selecting all the users and the first time they completed event 1 during the given time range.

E2:
This selects all the users who completed Step 1 and the first time they did Step 2 between the time they did Step 1 and the end date.

E3:
This selects all the users who completed Step 1 one and Step 2 and the first time they did Step 3 between the time they did Step 2 and the end date.

The Counts and Percentages:
This query first creates a table with the user ID, the time of each step was completed, and 1 or 0 based on whether or not the user completed each step. It calculates the total number of users who completed each step (each step value). Then the query calculates the conversion rate between Step 1 and Step 2, Step 2 and Step 3, and the overall conversion rate.

Putting it all together, and you a table with the following results:

| Step 1 Total | Step 2 Total | Step 3 Total | Step 1 → Step 2 % | Step 2 → Step 3 % | Step 1→Step 3 % |

To take advantage of Mode’s custom visualizations, we tweaked our query to return:

| Step 1 | Count |

| Step 2 | Count |

| Step 3 | Count |

Funnels Grouped by an Event-Level Property

Each event has a set of properties ranging from attributes Heap captures automatically, like referrer, UTM parameters, and device type, as well as any properties you set using snapshots, heap.addEventProperties, or heap.track.

To view a breakdown of the conversion rate based on these event-level properties, only a couple of modifications need to be made.

First, select the event that contains the event-level property you would like to analyze. In this example, this event-level property is a property of event 1. This query selects the utm_source (the event-level property) for the first event so that the event property can be included in the GROUP BY clause of the funnel.

In the funnel, include the event-level property in the SELECT statement and GROUP BY clause. You can also join this on the users table (JOIN main_production.users on main_production.users.user_id=e1.user_id ) to analyze your conversion rates based on user-level properties as well.

This funnel is valuable for many reasons. For example, eCommerce sites can send the number of items in the cart as an event-level property for the beginning of the checkout process. Using this query, you can analyze the correlation between conversion rates and number of items in the cart.

Also available in Mode: Onboarding Funnel Group by Event Level Property

For steps to do funnel analysis in Heap, see Funnel analysis overview and apply a group by with any event-level property.
WITH e1
  AS (
        SELECT DISTINCT e1.user_id
                ,1 AS step_1
                ,min(TIME) AS step_1_time
                ,utm_source
        FROM main_production.define_event AS e1
        INNER JOIN (
                SELECT user_id
                        ,min(TIME) AS mintime
                FROM main_production.define_event
                WHERE DATE (TIME) BETWEEN '2016-04-13'
                                AND '2016-04-20'
                GROUP BY 1
                ) AS min_value ON e1.user_id = min_value.user_id
                AND e1.TIME = min_value.mintime
        WHERE DATE (TIME) BETWEEN '2016-04-13'
                        AND '2016-04-20'
        GROUP BY 1
                ,4
        )
  ,e2 AS (
        SELECT e1.user_id
                ,1 AS step_2
                ,MIN(time) AS step_2_time
        -- change to table name of event 2 in your funnel
        FROM main_production.step_2 AS e2
        INNER JOIN e1 ON e1.user_id = e2.user_id
        WHERE time BETWEEN step_1_time
                        AND '2016-04-21'
        GROUP BY 1
        )
        ,e3
  AS (
        SELECT e3.user_id
                ,1 AS step_3
                ,MIN(time) AS step_3_time
        -- change to table name of event 3 in your funnel
        FROM main_production.step_3 AS e3
        INNER JOIN e2 ON e2.user_id = e3.user_id
        WHERE TIME BETWEEN step_2_time
                        AND '2016-04-21'
        GROUP BY 1
        )

  SELECT utm_source
        ,sum(step_1) AS "Step 1"
        ,sum(step_2) AS "Step 2"
        ,sum(step_3) AS "Step 3"
  FROM (
        SELECT e1.user_id
                ,step_1
                ,step_1_time
                ,step_2
                ,step_2_time
                ,step_3
                ,step_3_time
                ,utm_source
        FROM e1
        LEFT JOIN e2 ON e1.user_id = e2.user_id
        LEFT JOIN e3 ON e2.user_id = e3.user_id
        )
  GROUP BY 1

Restricting Funnels to One Session

In this funnel, user ID and session ID are aligned to make sure that funnel spans across one session. There are a couple of tweaks needed in each step of the query.

E1:
First we need to gather the user ID, session ID, and event time for step one data. Make sure you are grouping by both user ID and session ID so that all of the users’ sessions that occur in the time frame are included in the funnel.

E2 and E3:
Next, you want to modify the second part of this query to join step 2 and step 3 data based on both user ID and session ID. Adding an and statement within the join allows us to join this data successfully. Make sure you continue to include session ID in your SELECT and GROUP BY clauses.

Conversion Rate:
To calculate the conversion rate, we first join these three event tables and calculate the number of users who have completed steps one, two, and three in each month. Then we are going to calculate the conversion rate for each month and group by month.

Also available in Mode: Onboarding Funnel limited to one session

For steps to do funnel analysis in Heap, see Funnel analysis overview and set the date range granularity to a single session.
WITH e1 AS 
  ( 
                SELECT DISTINCT user_id, 
                                session_id , 
                                Min(time) AS step_1_time 
                                -- change to table name of event 1 in your funnel 
                FROM            main_production.step_1 AS e1 
                WHERE           time BETWEEN '2016-04-13' AND             '2016-04-21' 
                GROUP BY        1, 
                                2 ), e2 AS 
  ( 
           SELECT     e1.user_id, 
                      e1.session_id , 
                      Min(time) AS step_2_time 
                      -- change to table name of event 2 in your funnel 
           FROM       main_production.step_2 AS e2 
           INNER JOIN e1 
           ON         e1.user_id = e2.user_id 
           AND        e1.session_id = e2.session_id 
           WHERE      time BETWEEN step_1_time AND        '2016-04-21' 
           GROUP BY   1, 
                      2 ), e3 AS 
  ( 
           SELECT     e3.user_id, 
                      e3.session_id , 
                      Min(time) AS step_3_time 
                      -- change to table name of event 3 in your funnel 
           FROM       main_production.step_3 AS e3 
           INNER JOIN e2 
           ON         e2.user_id = e3.user_id 
           AND        e2.session_id = e3.session_id 
           WHERE      time BETWEEN step_2_time AND        '2016-04-21' 
           GROUP BY   1, 
                      2 ) 
  SELECT count(DISTINCT step_1 as step_one ,count(DISTINCT step_2) AS step_two ,count(DISTINCT step_3) AS step_three ,round(count(DISTINCT step_3)/count(DISTINCT step_1)::decimal, 2) AS conversion_rate
       -- calculate additional percentages here 
       FROM ( 
                 SELECT    e1.user_id AS step_1 , 
                           e1.session_id , 
                           step_1_time , 
                           e2.user_id AS step_2 , 
                           step_2_time , 
                           e3.user_id AS step_3 , 
                           step_3_time 
                 FROM      e1 
                 LEFT JOIN e2 
                 ON        e1.user_id = e2.user_id 
                 AND       e1.session_id = e2.session_id 
                 LEFT JOIN e3 
                 ON        e2.user_id = e3.user_id 
                 AND       e2.session_id = e3.session_id 
                 ORDER BY  1 ) 1

Conversion Rate Over Time

This report gives the user a window to complete the funnel and tracks how your conversion changes over time.

Now available in Mode: Rolling Conversion Rate

For steps to see conversion rate over time in Heap, see Calculate conversion rate between two events.
WITH e1 AS(
  SELECT DISTINCT user_id
        ,1 AS step_1
        ,MIN(time) AS step_1_time
        ,TO_CHAR(e1.time, 'YYYY-MM') as "month"
  FROM main_production.[event_1] AS e1
  WHERE TIME BETWEEN '2015-04-01' --start time
                AND '2016-04-01'  --end time
  GROUP BY 1,4
  ),

  e2 as (
  SELECT e1.user_id, 1 AS step_2, MIN(e2.time) AS step_2_time, e1.step_1_time
    FROM main_production.[event_2] AS e2
    JOIN e1 ON e1.user_id=e2.user_id AND e1.step_1_time = DATEADD('day', -30, e2.time)
    WHERE time BETWEEN step_1_time AND '2016-04-01'
    GROUP BY 1, 4
  ),
  e3 as (
  SELECT e3.user_id, 1 AS step_3, MIN(time) AS step_3_time, e2.step_1_time
    FROM main_production.[event-3] AS e3
    JOIN e2 ON e2.user_id=e3.user_id AND e2.step_1_time = DATEADD('day', -30, e3.time)
    WHERE time BETWEEN step_2_time AND '2016-04-01'
    GROUP BY 1, 4
  )

  SELECT MONTH, ROUND(SUM(step_3)/SUM(step_1)::DECIMAL, 2) AS "Conversion Rate"
  FROM(
    SELECT e1.user_id, step_1, e1.step_1_time, step_3, step_3_time, e1.month
      FROM e1 LEFT JOIN e2 ON e1.user_id = e2.user_id AND e1.step_1_time=e2.step_1_time 
      LEFT JOIN e3 ON e2.user_id= e3.user_id AND e2.step_1_time=e3.step_1_time
  )
  GROUP BY 1
  ORDER BY 1

Time to Conversion / Average Time to Event

Time can be an interesting metric to analyze. The time it takes for a user to make their first purchase from the date first seen, or the time it takes a user to fill out a form from the first field to submit, or the time between two purchases can be measured and optimized. You can analyze which behaviors, user properties, or first touch properties are associated with faster conversion rates.

This example measures the time it takes users to purchase an item but can be easily adapted to measure a slew of different events, and segmented by a variety of user and event-level properties.

Also available in Mode: Average Time to Event

For steps to set up a chart of this in Heap, see See how long it takes users to convert.
SELECT avg(duration) AS "Average hours until Purchase"
  FROM (
        SELECT user_id
                ,datediff('milliseconds', start_time, end_time)::DECIMAL / 1000 / 60 / 60 / AS "duration"
        FROM (
                SELECT main_production.users.user_id
                        ,min(joindate) AS start_time
                        ,min(main_production.purchase.time) AS end_time
                FROM main_production.sign_up
                JOIN main_production.users ON main_production.purchase.user_id = main_production.users.user_id
                GROUP BY 1
                )
        )

Path Analysis

This query allows you to track the most common flows or ‘paths’ into and out of your product via Heap Connect.

As is, the SQL selects the first 5 events of every session consisting of at least five events. It then aggregates the number of users who have completed each flow and displays the top 20 most common paths. For example, looking at the table below, we see the most common user flow is
home-view_page -> signup_view_page -> signup_enter_name -> signup_enter_email -> signup_enter_password and that this particular flow has occured 2791 times.

We recommend these instructions in cases where you want to filter for users who meet certain criteria based on data that only exists in your warehouse. For all other paths analysis cases, we recommend using the Journeys chart type in Heap. See Journeys overview to get started.

Heap is unique in the flexibility offered when defining events. A given event_id across all Redshift or BigQuery tables corresponds to a unique event recorded by Heap. That said, the same event_id may exist in multiple event tables. This is because Heap provides the flexibility to create multiple event definitions that correspond to the same raw event. For instance, you may define the following two events in the product:

  • Click CTA defined as Click on .cta
    Click CTA - Homepage defined as Click on .cta with a filter where Path equals /

If a user clicks the CTA on the homepage, a new event will be recorded on both event tables because it corresponds with both event definitions. As a result, two events with the same event_id will be included in the all_events table, since it contains every recorded instance of all defined and custom events.

In order to select the events most valuable to you, when multiple events are firing at the same time, this query selects the events that are the most specific. This is based on the heuristic that events with more specificity are fired less overall then events with more general definitions. If you look at the example mentioned above, although two events are firing when a user clicks .cta on the homepage, this query would only include the Click CTA - Homepage event, rather than the global Click CTA event, allowing you to have more insight into your users’ paths.

Also available in Mode: Path Analysis

In Mode’s visualization, each block of color is representative of a particular event; it’s size is relative to the percentage of users who have completed that event. By hovering over a particular chain of events starting at the center of the circle, you can trace the relative and absolute percentage of users who have followed the path within your app.

WITH event_count
  AS (
        SELECT event_table_name
                ,count(*) AS cardinality
        FROM main_production.all_events
        WHERE TIME  dateadd('day', - 30, getDate())
        GROUP BY 1
        )
        ,all_events
  AS (
        SELECT DISTINCT event_id
                ,main_production.all_events.user_id AS user_id
                ,main_production.all_events.event_table_name AS event_name
                ,main_production.all_events.TIME AS occurred_at
                ,cardinality AS number
        FROM main_production.all_events
        LEFT JOIN event_count ON main_production.all_events.event_table_name =event_count.event_table_name
        ORDER BY 1
                ,2
        )
        ,events
  AS (
        SELECT all_events.user_id
                ,all_events.event_id
                ,event_name
                ,occurred_at
                ,cardinality
        FROM all_events
        INNER JOIN (
                SELECT event_id
                        ,user_id
                        ,min(number) AS cardinality
                FROM all_events
                GROUP BY 1
                        ,2
                ) AS event ON all_events.number = event.cardinality
                AND all_events.event_id = event.event_id
                AND all_events.user_id = event.user_id
        ORDER BY 1
                ,2
                ,4
        )
  SELECT e1
        ,e2
        ,e3
        ,e4
        ,e5
        ,COUNT(*) AS occurrances
  FROM (
        --Pivot out first five events in each session
        SELECT user_id
                ,session
                ,MAX(CASE 
                                WHEN event_number = 1
                                        THEN event_name
                                ELSE NULL
                                END) AS e1
                ,MAX(CASE 
                                WHEN event_number = 2
                                        THEN event_name
                                ELSE NULL
                                END) AS e2
                ,MAX(CASE 
                                WHEN event_number = 3
                                        THEN event_name
                                ELSE NULL
                                END) AS e3
                ,MAX(CASE 
                                WHEN event_number = 4
                                        THEN event_name
                                ELSE NULL
                                END) AS e4
                ,MAX(CASE 
                                WHEN event_number = 5
                                        THEN event_name
                                ELSE NULL
                                END) AS e5
        FROM (
                -- Find event number in session
                SELECT z.*
                        ,ROW_NUMBER() OVER (
                                PARTITION BY user_id
                                ,session ORDER BY occurred_at
                                ) AS event_number
                FROM (
                        -- Sum breaks to find sessions
                        SELECT y.*
                                ,SUM(BREAK) OVER (
                                        ORDER BY user_id
                                                ,occurred_at ROWS UNBOUNDED PRECEDING
                                        ) AS session
                        FROM (
                                -- Add flag if last event was more than 10 minutes ago
                                SELECT x.*
                                        ,CASE 
                                                WHEN last_event IS NULL
                                                        OR occurred_at = last_event + INTERVAL '10 MINUTE'
                                                        THEN 1
                                                ELSE 0
                                                END AS BREAK
                                FROM (
                                        -- Find last event
                                        SELECT *
                                                ,LAG(occurred_at, 1) OVER (
                                                        PARTITION BY user_id ORDER BY occurred_at
                                                        ) AS last_event
                                        FROM events
                                        ) x
                                ) y
                        ) z
                ) a
        WHERE event_number <= 1="1" 5="5" 6="6" 20="20" group="GROUP" by="BY" ,2=",2" )=")" final="final" ,3=",3" ,4=",4" ,5=",5" order="ORDER" desc="DESC" limit="LIMIT"></=>

Attribution

This section will go through Heap Connect queries that enable you to measure attribution to better understand where your users are coming from.

The influence chart in Heap allows you to analyze what behaviors or marketing channels have influenced behavior of interest. For instructions to set up an influence chart, see Influence analysis overview. For more info about how attribution is measured in Heap, see Understand attribution.

Top Referrers

This queries the top N referrers and the number of sessions they generate.

Mode Analytics

First Touch Properties

This query generates a table that matches Heap’s Users view. You can export and select all the user-level properties and all the first touch properties.

How it works

This query starts off by calculating the first session for each user, and joins the first sessions data with the users table based on the unique user ID. From here, each user-level property and session-level property can be selected.

Applications

In addition to generating a list of all users, this query can be modified to contain the user-level properties of a subset of users. You can generate a list of users who were first seen during a specific period of time by adding in a WHERE clause with a date range. This table can be joined to other reports to track a particular cohorts experience through your product. You can generate a list of email addresses for a particular email blast targeted towards users who joined based on a certain UTM campaign.

Also available in Mode: First Touch Properties

SELECT all_sessions.user_id
        ,first_session.first_time AS first_seen
        ,referrer
        ,utm_source
        ,utm_campaign
        ,utm_medium
        ,utm_content
  FROM main_production.sessions AS all_sessions
  INNER JOIN (
        SELECT user_id
                ,MIN(time) AS first_time
        FROM main_production.sessions
        GROUP BY user_id
        ) AS first_session ON all_sessions.user_id = first_session.user_id
        AND all_sessions.time = first_session.first_time
  INNER JOIN main_production.users AS users ON all_sessions.user_id = users.user_id
--[optional] insert where clause here
  ORDER BY 3 DESC

First Touch Property by User Count

This query returns the number of users that fall into each category. In this case, the number of users who have come in via each UTM source will be displayed. This can be adapted and broken down by any first touch or user-level properties. For example, this query can be modified to return the number of users who have the initial referrer Facebook and the initial device type as mobile, which in turn can help you make decisions about successful ad campaigns.

Also available in Mode: Users per UTM Source

SELECT utm_source, COUNT(distinct user_id)
  FROM main_production.sessions AS all_sessions
  INNER JOIN (
        SELECT user_id
                ,MIN(time) AS first_time
        FROM main_production.sessions
        GROUP BY user_id
        ) AS first_session ON all_sessions.user_id = first_session.user_id
        AND all_sessions.time = first_session.first_time
  INNER JOIN main_production.users AS users ON all_sessions.user_id = users.user_id
  GROUP BY 1

Multi-Touch Attribution

Users come in via many channels, and the campaign that drives the first touch doesn't always match with the campaign on the conversion event. Because Heap automatically captures the referrer and UTM parameters for each session, you can build a query that compares the channel for each session a user has. This query compares the UTM source of the first touch with the UTM source on the session with the conversion event, and calculates the number of users who fall into each category.

How it works

First, we calculate the UTM source of the first touch by finding the session with the earliest start time, then join the first session with the conversion event based on the user_id . Finally, we select the initial UTM campaign, and the conversion event’s UTM campaign and the breakdown of users. Make sure to include a GROUP BY statement to categorize your users.

Also available in Mode: Multitouch Attribution

SELECT initial.utm_campaign AS "INITIAL"
        ,CASE 
                WHEN event_2.utm_campaign IS NULL
                        THEN 'None'
                ELSE event_2.utm_campaign
                END AS "LAST TOUCH"
        ,count(DISTINCT initial.user_id) AS "Number of Users"
  FROM (
        SELECT all_sessions.user_id
                ,CASE 
                        WHEN all_sessions.utm_campaign IS NULL
                                THEN 'None'
                        ELSE all_sessions.utm_campaign
                        END
        FROM main_production.sessions AS all_sessions
        INNER JOIN (
                SELECT user_id
                        ,MIN(time) AS first_time
                FROM main_production.sessions
                GROUP BY user_id
                ) AS first_session ON all_sessions.user_id = first_session.user_id
                AND all_sessions.time = first_session.first_time
        ) AS initial
  JOIN main_production.conversion_event AS event_2 
    ON initial.user_id = event_2.user_id
  GROUP BY 1
        ,2
  ORDER BY 3 DESC

Retention

This Heap Connect query calculates session to session retention analysis on a monthly basis. Each user is segmented into a cohort based on the month of their join date, and is counted in the total retained on a monthly basis if they have a session during each month.

For steps to conduct retention analysis in Heap, see Retention analysis overview.

Also available in Mode: Retention Cohorted by Joindate

In this chart, each user is assigned to a cohort based on their join date. Each cohort is represented by a different color (e.g. the users who joined in July 2015 are in the 2015-07 or orange cohort). It then traces the percentage of users who have at least one additional session in the following months. If we continue to look at the orange cohort, this chart shows us that .16 or 16% of users came back in July and had an additional session, .06 or 6% of users came back in August for an additional session, .04 or 4% came back in September, and so on.

You can compare different cohorts by adjusting the query to contain the distance from the join date rather than the month.

Also available in Mode: Comparing Retention for Cohorts by Joindate

This retention report cohorts users in the same way as the previous retention report; the orange cohort is still mapped to users who first joined in July. However, rather than mapping the percentage of users retained to a particular month, this query uses relative time to track retention across periods (same month as join date, one month later, two months later, etc.). It then stacks each cohort to allow you to analyze how the join date is correlated with retention. If you look at Month 0, you'll see that users who joined in July 2015 have a retention rate of 16% during that first month, whereas users who joined in February 2016 have a 12% retention rate.

There is also a slight caveat in this view. You will notice that several cohorts have 0% retained users. In some cases, this is because there is no data five months after May 2016.

Month-to-Month Retention

--get a cohort based on join date
  WITH new_user_activity
  AS (
        SELECT activity.user_id
                ,activity.TIME AS DATE
        FROM main_production.sessions AS activity
        INNER JOIN users ON users.user_id = activity.user_id
                AND users.joindate = activity.TIME
        WHERE date_part('year', joindate)  2014
        )
        --number of users in cohort
        ,cohort_active_user_count
  AS (
        SELECT to_char(DATE, 'YYYY-MM') AS DATE
                ,count(DISTINCT user_id) AS count
        FROM new_user_activity
        GROUP BY 1
        )
  --change format to change granularity
  SELECT joindate
        ,period
        ,new_users
        ,retained_users
        ,retention
  FROM (
        SELECT to_char(new_user_activity.DATE, 'YYYY-MM') AS "joindate"
                ,'Month ' ||
                --change month to change granularity
                datediff('month', new_user_activity.DATE, future_activity.TIME) AS period
                ,max(cohort_size.count) AS new_users
                ,-- all equal in group
                count(DISTINCT future_activity.user_id) AS retained_users
                ,round(count(DISTINCT future_activity.user_id) / max(cohort_size.count)::DECIMAL, 2) AS retention
        FROM new_user_activity
        LEFT JOIN main_production.sessions AS future_activity ON new_user_activity.user_id = future_activity.user_id
                AND new_user_activity.DATE = future_activity.TIME
        LEFT JOIN cohort_active_user_count AS cohort_size ON to_char(new_user_activity.DATE, 'YYYY-MM') = cohort_size.DATE
        GROUP BY 1
                ,2
        ) t
  WHERE period IS NOT NULL
  ORDER BY 1
        ,2

Joining Heap with Revenue Data

One of the most powerful aspects of Heap Connect is the ability to join a complete set of user interactions with other data sources to see revenue data, such as a user’s order history. This section will cover topics such as calculating the average revenue per user and the lifetime value of a user based on their interactions in your product and attribution channels. You can check out a complete list of all queries in Common Queries.

ARPU per UTM Source

This query combines Heap’s attribution data with order history data, allowing you to compare how each source is correlated with a user’s downstream behavior.

Also available in Mode: ARPU by UTM Source

WITH first_touch 
     AS (SELECT all_sessions.user_id, 
                identity, 
                first_session.first_time AS first_seen, 
                utm_source 
         --add any user level properties or session level properties here 
         FROM   main_production.sessions AS all_sessions 
                inner join (SELECT user_id, 
                                   Min(TIME) AS first_time 
                            FROM   main_production.sessions 
                            GROUP  BY user_id) AS first_session 
                        ON all_sessions.user_id = first_session.user_id 
                           AND all_sessions.TIME = first_session.first_time 
                inner join main_production.users AS users 
                        ON all_sessions.user_id = users.user_id) 
  SELECT CASE 
         WHEN utm_source IS NULL THEN 'none' 
         ELSE utm_source 
       END, 
       Round(SUM(order_total) :: DECIMAL, 2) AS total_revenue, 
       Round(SUM(order_total) :: DECIMAL / Count(DISTINCT customer_id) :: 
             DECIMAL, 2) 
                                             AS avg_per_customer 
  FROM   heap.TRANSACTION 
       join first_touch 
         ON customer_id = identity 
  GROUP  BY 1

Monthly ARPU per UTM Source

With a slight tweak to your query, you can see how the UTM source is correlated with the ARPU on a month-to-month basis.

Also available in Mode: Monthly ARPU per UTM

WITH first_touch 
     AS (SELECT all_sessions.user_id, 
                identity, 
                first_session.first_time AS first_seen, 
                utm_source 
         --add any user level properties or session level properties here 
         FROM   main_production.sessions AS all_sessions 
                inner join (SELECT user_id, 
                                   Min(TIME) AS first_time 
                            FROM   main_production.sessions 
                            GROUP  BY user_id) AS first_session 
                        ON all_sessions.user_id = first_session.user_id 
                           AND all_sessions.TIME = first_session.first_time 
                inner join main_production.users AS users 
                        ON all_sessions.user_id = users.user_id) 
  SELECT utm_source, 
       CASE 
         WHEN Length(Substring(order_date, 0, Position('/' IN order_date))) = 2 
       THEN 
         '20' 
         || 
       Right(order_date, 2) 
         || 
       '-' 
         || 
       Substring(order_date, 0, Position('/' IN order_date)) 
         ELSE '20' 
              || Right(order_date, 2) 
              ||'-0' 
              || Substring(order_date, 0, Position('/' IN order_date)) 
       END                                   AS order_month, 
       Count(DISTINCT customer_id), 
       Round(SUM(order_total) :: DECIMAL, 2) AS total_revenue, 
       Round(SUM(order_total) :: DECIMAL / Count(DISTINCT customer_id) :: 
             DECIMAL, 2) 
                                             AS avg_per_customer 
  FROM   heap.TRANSACTION 
       join first_touch 
         ON customer_id = identity 
  WHERE  utm_source IS NOT NULL 
  GROUP  BY 1, 
            2 
  ORDER  BY 2

ARPU by Behavior

You can also combine Heap data with revenue data to determine if certain actions or if the number of times an action is performed is correlated with higher returns. This report segments users into two categories: users who have uploaded files, and users who haven't uploaded files. It then calculates the ARPU for each segment. We can see here that the ARPU is roughly seven times higher for users who have uploaded files.

Also available in Mode: ARPU by Behavior

WITH user_info 
     AS (SELECT identity, 
                users.user_id, 
                SUM(order_total) AS total_revenue, 
                Count(event_id)  AS event_count 
         FROM   main_production.users AS users 
                join heap.TRANSACTION 
                  ON identity = customer_id 
                left join main_production.dashboard_galleries_click_add_new_files AS event 
                       ON users.user_id = event.user_id 
         GROUP  BY 1, 
                   2) 
  SELECT CASE 
         WHEN event_count = 0 THEN 'has not uploaded files' 
         ELSE 'has uploaded files' 
       END, 
       Round(SUM(total_revenue) :: DECIMAL / Count(DISTINCT user_id), 2) AS 
       avg_revenue_per_user 
  FROM   user_info 
  GROUP  BY 1

Churned Customer Revenue

Heap Connect allows you to get a better understanding of your churned customers as well. You can answer questions like “Where did they come from?”, “What did they do in my product?”, and “How much revenue did a customer bring in before churning?”. The table below combines Heap session data, server-side events, and revenue data to enhance your understanding of churned customers.

Also available in Mode: Churned Customer Behavior

WITH first_touch 
     AS (SELECT all_sessions.user_id, 
                identity, 
                first_session.first_time AS first_seen, 
                utm_source 
         --add any user level properties or session level properties here 
         FROM   main_production.sessions AS all_sessions 
                inner join (SELECT user_id, 
                                   Min(TIME) AS first_time 
                            FROM   main_production.sessions 
                            GROUP  BY user_id) AS first_session 
                        ON all_sessions.user_id = first_session.user_id 
                           AND all_sessions.TIME = first_session.first_time 
                inner join main_production.users AS users 
                        ON all_sessions.user_id = users.user_id) 
  SELECT utm_source, 
       Round(SUM(order_total) :: DECIMAL, 2) 
       AS churned_customer_total_value, 
       Round(Count(DISTINCT order_id) :: DECIMAL / Count(DISTINCT user_id), 2) 
       AS 
       avg_orders_per_user, 
       Round(SUM(order_total) :: DECIMAL / Count(DISTINCT first_touch.identity), 
       2) AS 
       avg_ltv 
  FROM   first_touch 
       join heap.TRANSACTION 
         ON customer_id = first_touch.identity 
       join main_production.account_modification 
         ON first_touch.identity = main_production.account_modification.identity 
  WHERE  account_change = 'cancel' 
       AND utm_source IS NOT NULL 
  GROUP  BY 1

Modeling Your Data

For info on how to dig into user and session info in Heap, see Users overview.

Creating An Enhanced Users Table

This query provides an example of how to recreate the users table, adding the Initial User Properties (ex: Initial Referrer, Initial Browser, etc) that are displayed in your Heap account.

BEGIN;
DROP TABLE IF EXISTS enhanced_users;
CREATE TABLE enhanced_users AS
SELECT u.*
       ,s.time AS date_first_seen
       ,s.referrer AS initial_referrer
       ,s.utm_source AS initial_utm_source
       ,s.utm_medium AS initial_utm_medium
       ,s.utm_campaign AS initial_utm_campaign
       ,s.city AS initial_city
       ,s.region AS initial_region
       ,s.country AS initial_country
       ,s.landing_page AS initial_landing_page
       ,s.browser AS initial_browser
       ,s.device_type AS initial_device_type
       ,s.platform AS initial_platform
       ,s.ip AS initial_ip
FROM heap_production.sessions s
JOIN heap_production.users u ON s.user_id = u.user_id
JOIN
  (SELECT user_id, min(TIME) AS first_session_time 
    FROM heap_production.sessions
    GROUP BY user_id) AS first_session 
    ON s.user_id = first_session.user_id
        AND first_session_time = s.time;
END;

Sessionizing Client-Side and Server-Side Events

BEGIN;
-- Prepending a table with # makes a temp table. Change this if you want a table to persist.
DROP TABLE IF EXISTS #sessions;
CREATE TABLE #sessions AS
SELECT user_id 
      ,min(time) over (partition by calc_session_id) as session_time
      ,time as event_time
      ,calc_session_id
      ,split_part(calc_session_id, '_', 2) as visit_number_in_range
      ,event_table_name
      ,first_value(path) over (partition by calc_session_id) as landing_page
      ,first_value(utm_source) over (partition by calc_session_id) as utm_source
      ,first_value(utm_medium) over (partition by calc_session_id) as utm_medium
      ,first_value(utm_campaign) over (partition by calc_session_id) as utm_campaign
      ,path
      ,total_price
      ,subtotal_price 
FROM
(SELECT *
      ,user_id || '_' || sum(new_session) over (partition by user_id order by time rows unbounded preceding ) AS calc_session_id
FROM
(SELECT e.*
       ,p.path
       ,p.utm_source
       ,p.utm_medium
       ,p.utm_campaign
       ,o.total_price
       ,o.subtotal_price
       -- Change your session definition here i.e. 30 * 60 = 30 minutes so 60 * 60 = 1 hour.
       ,case when extract(epoch from e.time) - lag(extract(epoch from e.time)) over (partition by e.user_id order by e.time) = 30 * 60 then 1 else 0 end as new_session
FROM heap_production.all_events e
LEFT JOIN heap_production.pageviews p ON p.user_id = e.user_id
AND p.event_id = e.event_id
AND p.session_id = e.session_id
LEFT JOIN heap_production.confirmed_order o ON o.user_id = e.user_id
AND o.event_id = e.event_id
AND o.session_id = e.session_id
-- This example focuses on pageviews and a conversion event called `confirmed_order` from Heap's Shopify integration.
-- If you extend this model to include other tables and want properties from their tables, follow the left join pattern above and add the event_table_name to filter below.
WHERE (event_table_name = 'pageviews' or event_table_name = 'confirmed_order') 
-- Change your date range here. 
-- AND e.time  current_date - interval '1 week'
ORDER BY time) s)
ORDER BY time;
END;
Last updated