What is identity resolution?
Heap’s identify API allows you to create a single, cohesive view of a user across devices, browsers, and domains. When a user is identified via Heap’s identify API, their anonymous user_id is updated to a new user_id, which is a hash of the identity. Once this is set up, when this user is identified on another device or browser, this tells us that these users are the same, and that we should join their data together.
Internally, we call this a “user migration” because we are migrating user and event data from one record to another. In other words, we are resolving data from two users into one identity. This is identity resolution, at a high-level.
In the Heap app, we handle this for you. In Redshift and Snowflake exports, we handle this on write and in a view, respectively. However, in both BigQuery and S3 exports, we do not resolve identities for you. In order to merge this user activity correctly, you must resolve their identity using the
How do I apply this identity resolution mapping in my data warehouse?
The user_migrations table contains a mapping of from_user_id to to_user_id, and should be joined against the users table as well as all pertinent event tables. We recommend creating a view based on these joins that will refresh on a regular cadence.
Below is an example of the view you should create to resolve identity on the users table – join user_migrations.from_user_id on users.user_id, and then coalesce to_user_id and user_id to obtain the user’s final state:
CREATE VIEW users_view AS SELECT user_id, MIN("joindate") AS "joindate", MAX("last_modified") AS "last_modified", MAX("identity") AS "identity", MAX("handle") AS "handle", MAX("email") AS "email", FROM ( SELECT COALESCE("to_user_id", "user_id") AS "user_id", "joindate","last_modified","identity","handle","email" FROM users u LEFT JOIN user_migrations m ON u.user_id = m.from_user_id ) x GROUP BY user_id;
Below is an example of the view you should create to resolve identity on each of your synced event tables. You should be sure to select all unique columns from each event table in order to replicate the desired table with migrations applied. Each time you toggle on a new event table to sync in the Heap UI, you should create this migrated view for that table:
CREATE VIEW example_event_migrated_view AS SELECT COALESCE("to_user_id", "user_id") AS "user_id", "event_column_1","event_column_2","event_id","session_id","time","session_time","type","library","platform","device_type","country","region","city","ip","referrer","landing_page","browser","search_keyword","utm_source","utm_campaign","utm_medium","utm_term","utm_content","domain","query","path","hash","title","href","target_text" FROM example_event_to_be_migrated e LEFT JOIN user_migrations m ON e.user_id = m.from_user_id ;