Skip to content
  • Home
  • University
  • Developers
  • API
  • Releases
  • Status
  • Home
  • University
  • Developers
  • API
  • Releases
  • Status
Home Heap Connect Data Warehouses BigQuery Integration
Getting Started Installation Administration Define & Analyze Analysis Examples Heap Plays Success Guides Integrations Heap Connect Data Privacy

Table of Contents

Was this article helpful?

Yes No

Thank you for your feedback!

BigQuery Integration

In this article you'll learn:

  • Step-by-step instructions for connecting BigQuery to Heap
  • Additional information about raw table deduplication, supported regions, and limitations
This doc is for: Admins Architects
View instructions for: 

Heap Connect lets you directly access your Heap data in BigQuery. You can run ad-hoc analyses, connect to BI tools such as Tableau, or join the raw Heap data with your own internal data sources.

For access to Heap Connect, contact your Customer Success Manager or sales@heap.io.

Best of all, we automatically keep the SQL data up-to-date and optimize its performance for you. Define an event within the Heap interface, and in just a few hours, you’ll be able to query it retroactively in a clean SQL format.

Setup

In order to start accessing Heap Connect data through BigQuery, you’ll require an existing Google Cloud project. After some initial setup of your project, all that needs to be done is to add our Heap service account as a BigQuery user, and share your Project ID with us. All of these steps are detailed below.

Connection Requirements

Prerequisites

Before starting the Heap Connect BigQuery connection process, you’ll need to:

  1. Have a Google Cloud Platform (GCP) project. If you don’t already have a project created, you can learn how to do so here.
  2. Enable billing in the GCP project. If you haven’t already done so, you can follow instructions available here.
  3. Enable the BigQuery API. If you haven’t already done so, you can begin the process here.
  4. Know the region you want to use (see Supported Regions)
  5. Decide on a name for your dataset (optional, default is project_environment)

These three prerequisites are also outlined in GCP’s quick-start guide.

Once you’ve completed the prerequisites, to connect Heap to BigQuery, proceed as follows:

1. Authorize Heap access to BigQuery

Within the GCP dashboard for your selected project, please visit IAM & admin settings and click + Add.

The IAM & admin settings page in Google Cloud Platform

In the subsequent view, add heap-sql@heap-204122.iam.gserviceaccount.com as a BigQuery User and save the new permission.

The Add members to "Heap" section of the Google Cloud Platform page

We would prefer to be added as a BigQuery user per the steps above. At minimum, we need to be assigned to a dataEditor role, and additionally, have permissions for bigquery.jobs.create. See BigQuery’s access control doc to learn more about the different roles in BigQuery, and see this StackOverflow response for steps to grant individual permissions to create a custom IAM role for Heap.

2. Provide Heap Your GCP Project ID

Once the GCP project is configured, you’ll need to provide your Heap account team with your Project ID. You can find the Project ID within Project info on your GCP project dashboard (make sure you’re in the correct project). In the screenshot below, our project ID is heap-204419.

The Project ID highlighted in the Google Cloud Platform dashboard

That’s it! Your Heap account team will follow up once the initial connection has been made. Please don’t hesitate to reach out your account team or support@heap.io with any questions.

You can learn about how the data will be structured upon sync by viewing our docs on data syncing.

BigQuery Data Schema

The data sync will include two data sets:

  • <data set name – default to project_environment> – Includes views and raw tables. Views de-duplicate data but does not apply user migrations.
  • <data set name>_migrated – _migrated – Includes views that apply user migrations.

For data accuracy, we recommend querying the views in the second data set, because these have identity resolution applied. If you want tighter controls over identity resolution (e.g. apply your own identity resolution), you can query the views in the first data set.

‘Raw’ Tables

Each of the views (except for all_events) is backed by a “raw” table with the name <view_name>_raw. This means that every environment will have both a users view and users_raw table, for example. The views perform deduplication, as the underlying raw tables may have duplicated data introduced during the sync process.

Additionally, the users view filters out users that are from the user in an identify call. For that reason, we recommend querying only against the deduplicated views.

Supported Regions

Heap supports syncs to regions covered by the Multi-Regional Locations EU and US. Please contact support@heap.io if your GCP Project is located in europe-west2 (London) or europe-west6 (Zürich).

Was this article helpful?

Yes No

Thank you for your feedback!

Last updated February 18, 2021.

bigquerybigquery destinationintegration
  • Blog
  • Partners
  • Security
  • Terms
  • About
  • Careers
  • Privacy
  • Contact Us

© 2021 Heap, Inc.