This feature is only available to customers on the Premier plan or the Pro plan as an add-on. To upgrade, contact your Customer Success Manager or sales@heap.io
Overview
Heap Connect can export your data to S3 or to three dynamic data warehouses: Redshift, BigQuery, and Snowflake. All exports are completely retroactive, just like the Heap dashboard. As you label new events and modify existing events, we seamlessly update your cluster schema with all historical data for synced events, which ensures that Connect always provides a clean and accurate representation of the data received since you’ve been using Heap.
Data Hierarchy
In Heap, all data is hierarchical: you have many users, who conduct many sessions, and every session has many events, as shown in our data model. As a result, you can have metadata attached to every level of this hierarchy, which we refer to as user, session, and event-level properties. Any information on the user level propagates to the session and event level, and any information on the session level propagates to the event level.
For example, any user property (ex. email address) is available as a property on the session and event level in Heap, and any session property (ex. UTM Source) is available as a property on the event level. However, any event-level property (ex. URL Path) is not available on the session or user level, since it pertains to only where the event itself occurred. For a complete list of all of the information we collect out-of-the-box for each type of property, see Autocaptured Data.
Heap Connect adopts a similarly hierarchical schema. By default, every environment schema will have the following tables:
- One users table (for all user information)
- One sessions table (for all session information)
- One pageviews table (for all pageview information)
You can also choose to sync any labeled or custom event in Heap, which will add a new table to the schema for each synced event. For exports to a data warehouse, we’ll also create a table entitled all_events, which includes basic information about every event in the synced event tables joined for ease in aggregate analysis.
It’s important to note that Heap Connect contains all of your labeled data in Heap that you’ve selected to sync downstream. Out-of-the-box, Heap will automatically collect all user interactions (pageviews, clicks, etc.), which we refer to as raw data. Labeling events in Heap allows you to apply an intutive name to raw data. For instance, clicking on a cta.signup
might mean that a user signed up, so you might add the label Sign Up.
After a user labels an event and syncs it to Heap Connect, Heap will automatically sync all of the historical data associated with the event in its own table, and include these events in the all_events
table. Learn more about labeling events in Events overview.
Schema
All data in your account is hosted within a single cluster or datasets (BigQuery) under that project.
By default, these are named using the project name, then an underscore, followed by the environment name. For instance, each table in your main production environment will be part of the main_production
schema, while the tables in the development environment of a project named “My iOS App” will be in a schema or dataset called my_ios_app_development
.
For each environment, Heap will create the following tables (known as views in BigQuery):
- One
users
table - One
sessions
table - One
pageviews
table - One table for each event you’ve labeled and synced in the Heap interface
- One view for each segment you’ve labeled and synced in Heap (Redshift and Snowflake only)
- One
all_events
table, which contains the built-in pageview event and all labeled or custom events synced - One
_sync_history
table - One
_sync_info
table - One
_dropped_tables
table (BigQuery only) - One
user_migrations
table which lists all instances of merging and migrating users - One
_event_metadata
table
These tables have a dynamic schema, which auto-updates to reflect any new events or properties you’ve tracked.
Time zones
Time values in Heap Connect are set in UTC, regardless of your time zone settings in Heap. The exception to this is that Snowflake’s time values are determined by your Snowflake account settings.
Users Table
This table is called users
. It resides under the projectname_environmentname
schema. It contains a row for each user in the environment. The column schema is:
Column Name | Type | Description |
---|---|---|
user_id | BIGINT | Unique ID of user, randomly generated by Heap. |
identity | TEXT | User’s username or other unique token, passed via heap.identify API. Must be unique. |
handle | TEXT | User’s username or other unique token, passed in via heap.addUserProperties API. |
email | TEXT | User’s email address, passed in via heap.addUserProperties API. |
joindate | TIMESTAMP | Timestamp without time zone of when the user was first seen. NOTE: In S3, this is in UNIX timestamp format. |
last_modified | TIMESTAMP | Timestamp without time zone of when the user’s data was last modified. NOTE: In S3, this is in UNIX timestamp format. |
user properties… | TEXT | There will be one column for every unique user property you’ve sent via the heap.addUserProperties API (name, gender, account status, etc). The column type is automatically inferred from the underlying property values. |
Initial User Properties
The default users
table does not include the Initial properties you see in Heap, e.g. Initial Device Type. An example of how to recreate the users
table downstream with these properties can be found in Creating An Enhanced Users Table.
Sessions Table
What is a session?
A session in Heap is a period of activity from a single user in your app or website. It can include many pageviews or events. On web, a session ends after 30 minutes of pageview inactivity from the user. On mobile, a session ends after 5 minutes of inactivity, regardless of whether the app’s background or foreground state.
Note that “server-side sessions” will not appear in this table, and only appears in the users view in-app. Data for server-side events toggled to sync can be found in the all_events and event tables. The sessions
table has the following schema:
Column Name | Type | Library | Description |
---|---|---|---|
event_id | BIGINT | All | ID of associated session event, randomly generated by Heap used internally by Heap for data syncing. |
user_id | BIGINT | All | ID of associated user, randomly generated by Heap. |
session_id | BIGINT | All | ID of associated session, randomly generated by Heap. |
time | TIMESTAMP | All | Timestamp without time zone of when session started. |
library | TEXT | All | Version of Heap library which began the session. Can be one of “web” or “iOS”. |
platform | TEXT | Web, iOS | User’s operating system. |
device_type | TEXT | Web, iOS | Device type, which can be one of “Mobile”, “Tablet”, or “Desktop”. |
country | TEXT | All | Country in which user session occurred, based on IP. |
region | TEXT | All | Region in which user session occurred, based on IP. |
city | TEXT | All | City in which user session occurred, based on IP. |
IP | TEXT | Web, iOS | The IP address for the session, which is used for determining geolocation. |
referrer | TEXT | Web | URL that linked to your site and initiated the session. If the user navigated directly to your site, or referral headers were stripped, then this value will appear as NULL downstream and as `direct` in the UI. |
landing_page | TEXT | Web | URL of the first pageview of the session. |
landing_page_query | TEXT | Web | The query parameters of the first page of the user’s session. |
landing_page_hash | TEXT | Web | The hash route of the first page of the user’s session. |
browser | TEXT | Web | User’s browser. |
search_keyword | TEXT | Web | Search term that brought the user to your site. [Deprecated] |
utm_source | TEXT | Web | GA-based utm_source tag associated with the session’s initial pageview. |
utm_campaign | TEXT | Web | GA-based utm_campaign tag associated with the session’s initial pageview. |
utm_medium | TEXT | Web | GA-based utm_medium tag associated with the session’s initial pageview. |
utm_term | TEXT | Web | GA-based utm_term tag associated with the session’s initial pageview. |
utm_content | TEXT | Web | GA-based utm_content tag associated with the session’s initial pageview. |
device | TEXT | iOS | User’s device model. |
carrier | TEXT | iOS | User’s mobile carrier. |
app_name | TEXT | iOS | Current name of iOS app, as determined by CFBundleName. |
app_version | TEXT | iOS | Current version of iOS app, as determined by CFBundleShortVersionString. |
heap_device_id | TEXT | Android | Current ID of the device. |
heap_app_name | TEXT | Android | Current name of Android app. |
heap_app_version | TEXT | Android | Current version of the Android app. |
heap_device | TEXT | Android | User’s device model. |
Pageviews Table
Heap will create a table that contains every pageview by default.
Column Name | Type | Library | Description |
---|---|---|---|
event_id | BIGINT | All | ID of associated pageview event. |
user_id | BIGINT | All | Unique ID of associated user, randomly generated by Heap. |
session_id | BIGINT | All | Unique ID of associated session, randomly generated by Heap. |
time | TIMESTAMP | All | Timestamp without time zone of when the pageview occurred. |
library | TEXT | All | Version of Heap library which started the session. Can be one of “web” or “iOS”. |
platform | TEXT | Web, iOS | User’s operating system. |
device_type | TEXT | Web, iOS | Device type, which can be one of “Mobile”, “Tablet”, or “Desktop”. |
country | TEXT | Web, iOS | Country in which user session occurred, based on IP. |
region | TEXT | Web, iOS | Region in which user session occurred, based on IP. |
city | TEXT | Web, iOS | City in which user session occurred, based on IP. |
IP | TEXT | Web, iOS | The IP address for the session, which is used for determining geolocation. |
referrer | TEXT | Web | URL that linked to your site and started the session.If the user navigated directly to your site, or referral headers were stripped, then this value will appear as NULL downstream and as `direct` in the UI. |
landing_page | TEXT | Web | URL of the first pageview of the session. |
landing_page_query | TEXT | Web | The query parameters of the first page of the user’s session. |
landing_page_hash | TEXT | Web | The hash route of the first page of the user’s session. |
browser | TEXT | Web | User’s browser. |
search_keyword | TEXT | Web | Search term that brought the user to your site. [Deprecated] |
utm_source | TEXT | Web | GA-based utm_source tag associated with the session’s initial pageview. |
utm_campaign | TEXT | Web | GA-based utm_campaign tag associated with the session’s initial pageview. |
utm_medium | TEXT | Web | GA-based utm_medium tag associated with the session’s initial pageview. |
utm_term | TEXT | Web | GA-based utm_term tag associated with the session’s initial pageview. |
utm_content | TEXT | Web | GA-based utm_content tag associated with the session’s initial pageview. |
path | TEXT | Web | The path of the pageview. |
query | TEXT | Web | The query parameters associated with the pageview. |
hash | TEXT | Web | The hash parameters associated with the pageview |
title | TEXT | Web | Title of the current page. |
device | TEXT | iOS | User’s device model. |
carrier | TEXT | iOS | User’s mobile phone carrier. |
app_name | TEXT | iOS | Current name of iOS app, as determined by CFBundleName. |
app_version | TEXT | iOS | Current version of iOS app, as determined by CFBundleShortVersionString. |
view_controller | TEXT | iOS | Name of the current view controller. |
screen_a11y_id | TEXT | iOS | accessibilityIdentifier for the current view controller. |
screen_a11y_label | TEXT | iOS | accessibilityLabel for the current view controller. |
heap_device_id | TEXT | Android | Current ID of the device. |
heap_previous_page | TEXT | Web | The previous page visited in this session. |
heap_app_name | TEXT | Android | Current name of Android app. |
heap_app_version | TEXT | Android | Current version of the Android app. |
heap_device | TEXT | Android | User’s device model. |
Event Tables
Heap will create one table for every labeled event you’ve created within Heap and synced downstream.
For each event table, the columns are determined by the version of the Heap library that sent the event, along with any custom properties you may have added via snapshots, the custom API, or defined properties. For instance, if an event was sent by the Heap iOS library, its SQL table would contain iOS-specific columns like app_version
or carrier
, but not web-specific columns like landing_page
or referrer
. This helps keep your schema as clean as possible.
Note the following about labeled properties in Heap Connect:
- Labeled properties are supported for all Heap Connect destinations.
- The default
pageviews
andsessions
tables include labeled properties (Redshift, Snowflake, S3 only). - The default
users
table does include defined properties but, because that table doesn’t include the Initial properties you see in Heap, won’t contain values for properties relying on those, e.g. Initial Marketing Channel or Initial Browser. For information on adding Initial properties to theusers
table, see our Heap Connect Common Queries.
Keep Defined Property Values Below 256 Characters
There is a 256 character limit for defined properties in the data schema. Attempting to sync properties with values above 256 characters will cause updates to fail. For this reason, we recommend keeping all defined property values below 256 characters.
The name of these tables will be the event_name
itself and remain under the projectname_environmentname
schema. To produce the event_name
portion, we strip the event name of any non-alphanumeric characters and snake-case the result. For example, an event named “Sign Up – Click Link (Any)” becomes sign_up_click_link_any
.(Note that this means it’s possible for events to collide into the same table name. Make sure your event names are sufficiently distinct!)
The name of the event table is created during your first sync and it doesn’t change after that. Heap will pre-populate the name field for you before your first sync, but you have the option to rename it.
If you change the display name of a synced event in Heap later, it won’t change the event table name downstream.
Each of these event tables will be fully retroactive, meaning it will contain a row for every occurrence of the event since the day you installed Heap. The column schema is:
Column Name | Type | Libraries | Description |
---|---|---|---|
event_id | BIGINT | All | ID of the associated event, randomly generated by Heap. |
user_id | BIGINT | All | Unique ID of the associated user, randomly generated by Heap. |
session_id | BIGINT | All | Unique ID of the associated session, randomly generated by Heap. |
time | TIMESTAMP | All | Timestamp without time zone of when the event happened. |
type | TEXT | All | For web autocaptured events, can be any of view page, click, submit, change, with push state events registered as view page events. For iOS autocaptured events, can be touch, edit field, or a gesture recognizer you’ve labeled. For custom events, this will be the custom event name. |
library | TEXT | All | Version of Heap library on which event occurred. Can be one of “web”, “iOS”, or “server”. |
platform | TEXT | Web, iOS | User’s operating system. |
device_type | TEXT | Web, iOS | Device type, which can be one of “Mobile”, “Tablet”, or “Desktop”. |
country | TEXT | Web, iOS | Country in which user session occurred, based on IP. |
region | TEXT | Web, iOS | Region in which user session occurred, based on IP. |
city | TEXT | Web, iOS | City in which user session occurred, based on IP. |
IP | TEXT | Web, iOS | The IP address for the session, which is used for determining geolocation. |
referrer | TEXT | Web | URL that linked to your site and started the session.If the user navigated directly to your site, or referral headers were stripped, then this value will appear as NULL downstream and as `direct` in the UI. |
landing_page | TEXT | Web | URL of the first pageview of the session. |
landing_page_query | TEXT | Web | The query parameters of the first page of the user’s session. |
landing_page_hash | TEXT | Web | The hash route of the first page of the user’s session. |
browser | TEXT | Web | User’s browser. |
search_keyword | TEXT | Web | Search term that brought the user to your site. [Deprecated] |
utm_source | TEXT | Web | GA-based utm_source tag associated with the session’s initial pageview. |
utm_campaign | TEXT | Web | GA-based utm_campaign tag associated with the session’s initial pageview. |
utm_medium | TEXT | Web | GA-based utm_medium tag associated with the session’s initial pageview. |
utm_term | TEXT | Web | GA-based utm_term tag associated with the session’s initial pageview. |
utm_content | TEXT | Web | GA-based utm_content tag associated with the session’s initial pageview. |
domain | TEXT | Web | Domain including subdomain, e.g. blog.heap.io. |
path | TEXT | Web | Portion of the current URL following your domain, e.g. /docs for heap.io/docs. |
hash | TEXT | Web | Portion of the current URL following the hash sign, e.g. #install for heap.io/docs#install. |
query | TEXT | Web | Query params of the page’s current URL, e.g. ?utm_id=1234 for heap.io?utm_id=1234. |
title | TEXT | Web | Title of the current page. |
href | TEXT | Web | href property of link (used for clicks on anchor tags). |
device | TEXT | iOS | User’s device model. |
carrier | TEXT | iOS | User’s mobile phone carrier. |
app_name | TEXT | iOS | Current name of iOS app, as determined by CFBundleName. |
app_version | TEXT | iOS | Current version of iOS app, as determined by CFBundleShortVersionString. |
action_method | TEXT | iOS | Name of the action method triggered by this event, e.g. loginButtonWasPressed. |
view_controller | TEXT | iOS | Name of the current view controller. |
screen_a11y_id | TEXT | iOS | accessibilityIdentifier for the current view controller. |
screen_a11y_label | TEXT | iOS | accessibilityLabel for the current view controller. |
target_view_class | TEXT | iOS | Underlying class name of an iOS action’s target, e.g. UITableCellView. |
target_view_name | TEXT | iOS | Instance variable name of an iOS action’s target, e.g. loginButtonView. |
target_a11y_id | TEXT | iOS | accessibilityIdentifier of an iOS action’s target. |
target_a11y_label | TEXT | iOS | accessibilityLabel of an iOS action’s target. |
target_text | TEXT | Web, iOS | Button text of the event target. |
heap_device_id | TEXT | Android | Current ID of the device. |
heap_previous_page | TEXT | Web | The previous page visited in this session. |
heap_app_name | TEXT | Android | Current name of Android app. |
heap_app_version | TEXT | Android | Current version of the Android app. |
heap_device | TEXT | Android | User’s device model. |
Event properties… | TEXT | All | There will be one column for every unique event property you’ve attached, either via the heap.track API, the addEventProperties API, or snapshots. The column type is automatically inferred from the underlying property values. |
Segments View (Redshift And Snowflake Only)
Heap will create one view for every defined segment you’ve created within the Heap interface. These tables only contain one column (user_id
) that you can use to join against all other user information (users
or other event tables).
If a segment leverages a behavioral filter (e.g., “Users who have done”), make sure the event that’s referenced in the filter is also synced!
Column Name | Type | Description |
---|---|---|
user_id | BIGINT | Unique ID of associated user, randomly generated by Heap. |
All Events Table
Table Does Not Sync For S3
The all_events
table does not sync for S3. You can create an all_events
view of all synced tables after the data synced to S3 has gone through ETL processing.
Heap will create a table called all_events
, which contains all labeled or custom API events, pageviews, and sessions you have synced, as well as the built-in pageview event. The all_events
table is created on top of the other event tables, and includes an extra column event_view_name
that denotes the source table of each row. It will have the following columns:
Column Name | Type | Description |
---|---|---|
event_id | BIGINT | ID of associated event, randomly generated by Heap. |
user_id | BIGINT | Unique ID of associated user, randomly generated by Heap. |
session_id | BIGINT | Unique ID of associated session, randomly generated by Heap. |
time | TIMESTAMP | Timestamp without time zone of when event happened. |
event_table_name | TEXT | Name of the table where similar events can be found. |
BigQuery Limitations
In most cases, the schema for the all_events
view will consist of the union of all columns from all event tables, with nulls filled in where applicable. However, if you are syncing many tables to your project, this can run into BigQuery query length limits, in which case we’ll fall back to one of the following reduced column sets:
- All built-in Heap columns – i.e.,
event_view_name
and columns contained in the pageviews/sessions tables - Basic columns –
event_view_name
,user_id
,event_id
,session_id
,time
If your all_events
view is created with a reduced column set, you can join it with the underlying event views to access any property columns that are not available in the view.
Sync History Table
Heap creates a table called _sync_history
that contains a record for every attempted sync. This table is helpful for monitoring sync performance and scheduling dependent jobs. All timestamps are in UTC for Redshift and BigQuery and in local time for Snowflake.
Column Name | Type | Description |
---|---|---|
status | STRING | takes value running/succeeded/failed based on the outcome of the sync attempt |
start_time | TIMESTAMP | start time of the sync attempt |
finish_time | TIMESTAMP | finish time of the sync attempt |
error | STRING | error message if sync fails |
next_scheduled_sync_at | TIMESTAMP | time of the next scheduled sync (estimated at the time of sync based on sync frequency settings) |
Sync Info Table
Heap creates a table called _sync_info
that contains metadata around the sync process and last updated times.
Column Name | Type | Description |
---|---|---|
event_table_name | TEXT | the event name |
sync_started | TIMESTAMP | a timestamp for when that table began syncing |
sync_ended | TIMESTAMP | a timestamp for when that table completed syncing |
synced_to_time | TIMESTAMP | a timestamp that reflects the most recent occurrence of the event |
inserted_row_count | BIGINT | the number of rows inserted during the most recent sync |
Dropped Tables Table
The _dropped_tables
table has been deprecated as of October 20, 2023.
Heap creates a table called _dropped_tables
that contains a record of all data tables and views that were dropped as part of an update.
Column Name | Type | Description |
---|---|---|
event_table_name | STRING | name of table or view dropped |
dropped_at | TIMESTAMP | time of drop |
User Migrations Table
When a user is identified, a migration occurs to aggregate the data under the new identity. These migrations are recorded in this table.
Time is a column for Redshift and S3 destinations only. Snowflake and BigQuery do not have a column for time.
Column Name | Type | Description |
---|---|---|
from_user_id | BIGINT | the migrating user’s ID |
to_user_id | BIGINT | the destination user’s ID |
time | TIMESTAMP | a timestamp for when the migration occurred |
Event Metadata Table
This table contains metadata about the names of events, which allows you to build automated jobs that use this metadata. This table is re-created on each sync to stay updated with the latest event metadata.
Column Name | Type | Description |
---|---|---|
table_name | TEXT | For each event, this row contains the sanitized name of the table containing the event’s data synced to your warehouse |
ui_name | TEXT | For each event, this row contains the name of the event in Heap |
last_updated_at | TIMESTAMP | Time event label was last updated |
last_updated_by | TEXT | User who last updated event label |
notes | TEXT | Notes attached to the event |
source | TEXT | Data capture source (ex: web, ios, android) |
verified_at | TIMESTAMP | Time event was verified |
verified_by | TEXT | User who verified the event |
Event_metadata shows up differently for S3. You can query the table as follows.
{ui_name: 'Stripe charge',
table_name: 'stripe_charge',
source: 'stripe',
last_updated_at: '1609459200000',
last_updated_by: 'sales@heap.io',
notes: 'This is a note',
verified_at: '1640995200000',
verified_by: 'heappo@heap.io',
}
Data Syncing
As you labeled and sync new events and modify existing synced events, we seamlessly update your cluster schema. This ensures that the data is always a clean, accurate representation of your raw data.
In particular, the following changes can happen on each sync:
- When an event is toggled to sync for the first time, we create a new table for that event.
- When a synced event label is modified in the Heap interface, we tear down the existing table for that event, create a new one in its place, and populate the new table.
- Archiving events and properties that are synced downstream will pause these syncs indefinitely until they are restored. Specifically:
- Archived events will stop syncing downstream. Unarchiving these events will cause a new, full sync of the table.
- Archived properties will stop syncing downstream (the column will remain but it will be empty). Unarchiving these properties will cause the property to sync again, but there might be a gap in the table (with no backfill occurring).
- When a custom event property is seen for the first time, we add a new property column to the corresponding event table.
- When a custom user property is seen for the first time, we add a new user property column to the
users
table. - When user-level property values are updated with
heap.addUserProperties
, we update corresponding rows in theusers
table. - When an anonymous user gets identified with an existing identity via
heap.identify
, we “migrate” the user by running an update on theusers
table and another update on all event tables that reference the anonymous user’suser_id
.
Toggle Multiple Events or Segments To Sync
You can toggle multiple events or segments to sync from the Data > Labeled events or Segments page. Click the checkbox next to the event or segment, then click the Sync icon next to x events/segments selected.
You’ll be prompted to select your warehouse from the data warehouse drop-down, then confirm that you want to enable the selected events to be synced to the selected warehouse.
Toggle Multiple Properties To Sync/Unsync
Syncing / Unsyncing properties do not resync the full data set, so it won’t impact sync performance!
You can toggle multiple properties to sync or unsync. Unsyncing properties allows you to prevent the buildup of table cruft downstream, which can impact your sync performance.
To sync or unsync properties, from Data > Properties (event & users), click the checkbox next to the property, then click the sync or unsync icon next to x properties selected.
You can select an entire category by first selecting one property within that category. A checkbox will appear next to the category name which you can then select.
A pop-up will appear where you can select the warehouse(s) to sync or unsync these properties to.
Table Relationships and Joining with Other Data
Joining Heap Connect Tables
Given the hierarchical nature of the data model, joining user information with event data is simple in Heap Connect. All you need to do is join the user table using Heap’s unique user ID with the table(s) you are analyzing.
main_production.users
JOIN main_production.[event_name]
ON main_production.users.user_id = main_production.[event_name].user_id
The same approach can be taken using a combination of the user and session ID by joining the sessions table with any event table(s) of interest.
Joining With Other Data Sets
Similarly, Heap Connect allows you to easily combine user interaction data with data from other sources such as support, marketing automation, or revenue data. We recommend sending an internal user ID shared between these sources via our Identify API, which is then exposed retroactively on the users table in Heap and ready to join with other sources. Usually, you’ll need to join the event and users tables first so that the internal ID is available on the event level.
SELECT main_production.users.email, is_won
FROM main_production.users
JOIN main_production.viewed_customer_page
ON main_production.users.user_id=main_production.viewed_costumer_page.user_id
JOIN salesforce_.accounts
ON main_production.users.email=salesforce._account.name
Depending on the structure of your data, this query would show the account email of users who had viewed the customer page and whether or not that contract was won. Analyzing these two sources together can reveal whether or not the customers’ page is correlated with the number of deals won and help you answer questions like, “How can my sales team push more prospects to view this documentation?” or “How can I change my page to make it a more valuable resource?”