Overview
Heap’s Databricks integration allows you to sync Heap data to Databricks to leverage Heap behavioral data in other tools.
Prerequisites
To connect this integration, you'll need the following permissions:
- Admin or Architect privileges in Heap
- Access to an AWS-hosted Databricks account that uses the Unity Catalog
Setup
To get started, navigate to Integrations > Directory, search for Databricks, then select it where it appears.
You’ll be prompted to provide the following information:
- Hostname: The ID of your databricks account, which you can find in the account URL.
- Path: The path of the warehouse you are connecting via this integration.
- Catalog: The catalog that this Heap data should sync to; if left blank, this integration will create a new catalog.
- Schema (optional): The schema that this Heap data should sync to; if left blank, this integration will create a new schema.
- Token: This is required to allow Heap to write to the schema. The token must be a Personal Access Token (PAT) rather than an OAuth Token.
Once all those fields are populated, click the Connect button.
That’s it! Once setup is complete, you’ll see a sync within 48 hours with the following built-in tables.
- Pageviews
- Sessions
- Users
- user_migrations
You can can create an all_events
view to Databricks by setting up a query like this one:
SELECT
event_id,
time,
user_id,
session_id,
'test_event_table' AS event_table_name
FROM "TEST_DB"."TEST_SCHEMA"."TEST_EVENT_TABLE"
UNION
SELECT
event_id,
time,
user_id,
session_id,
'click_event_table' AS event_table_name
FROM "SCHEMA"."CLICK_EVENT_TABLE"
UNION
SELECT
event_id,
time,
user_id,
session_id,
'pageview_event_table' AS event_table_name
FROM "SCHEMA"."PAGEVIEW_EVENT_TABLE"
Limitations
Please note the following limitations for this integration:
- The All Events table is not synced to Databricks. As a workaround, you can create your own all_events.
- Defined properties syncing is not supported during beta.
- Segments syncing is not supported during beta.