Heap Connect: Identity Resolution (S3 only)

This feature is only available to customers on the Premier plan or the Pro plan as an add-on. To upgrade, contact your Customer Success Manager or sales@heap.io

This section is only relevant for Heap Connect instances using S3.

What is identity resolution?

Heap’s identify API allows you to create a single, cohesive view of a user across devices, browsers, and domains. When a user is identified via Heap’s identify API, their anonymous user_id is updated to a new user_id, which is a hash of the identity. Once this is set up, when this user is identified on another device or browser, this tells us that these users are the same, and that we should join their data together.

Internally, we call this a “user migration” because we are migrating user and event data from one record to another. In other words, we are resolving data from two users into one identity. This is identity resolution, at a high-level.

In the Heap app, we handle this for you. In Redshift, Snowflake, and BigQuery exports, we handle this on write and in a view for Snowflake and BigQuery, respectively. However, in S3 exports, we do not resolve identities for you. In order to merge this user activity correctly, you must resolve their identity using the user_migrations table.

How do I apply this identity resolution mapping in my data warehouse?

The user_migrations table contains a mapping of from_user_id to to_user_id, and should be joined against the users table as well as all pertinent event tables. We recommend creating a view based on these joins that will refresh on a regular cadence.

Below is an example of the view you should create to resolve identity on the users table – join user_migrations.from_user_id on users.user_id, and then coalesce to_user_id and user_id to obtain the user’s final state:

CREATE VIEW users_view AS
SELECT
  user_id,
  MIN("joindate") AS "joindate",
  MAX("last_modified") AS "last_modified",
  MAX("identity") AS "identity",
  MAX("handle") AS "handle",
  MAX("email") AS "email",
FROM
(
  SELECT
  COALESCE("to_user_id", "user_id") AS "user_id",
  "joindate","last_modified","identity","handle","email"
  FROM users u
  LEFT JOIN user_migrations m
    ON u.user_id = m.from_user_id
) x
GROUP BY user_id;

Below is an example of the view you should create to resolve identity on each of your synced event tables. You should be sure to select all unique columns from each event table in order to replicate the desired table with migrations applied. Each time you toggle on a new event table to sync in the Heap UI, you should create this migrated view for that table:

CREATE VIEW example_event_migrated_view AS 
SELECT
COALESCE("to_user_id", "user_id") AS "user_id",
"event_column_1","event_column_2","event_id","session_id","time","session_time","type","library","platform","device_type","country","region","city","ip","referrer","landing_page","browser","search_keyword","utm_source","utm_campaign","utm_medium","utm_term","utm_content","domain","query","path","hash","title","href","target_text"
FROM example_event_to_be_migrated e
LEFT JOIN user_migrations m
  ON e.user_id = m.from_user_id
;

Read in Order

Table of Contents

Was this article helpful?

Heap Connect: Identity Resolution (S3 only)

What is identity resolution?

How do I apply this identity resolution mapping in my data warehouse?