Skip to content
  • Home
  • University
  • Developers
  • API
  • Releases
  • Status
  • Home
  • University
  • Developers
  • API
  • Releases
  • Status
Home Heap Connect Data Warehouses Redshift Integration
Getting Started Installation Administration Define & Analyze Analysis Examples Heap Plays Success Guides Integrations Heap Connect Data Privacy

Table of Contents

Was this article helpful?

Yes No

Thank you for your feedback!

Redshift Integration

In this article you'll learn:

  • Step-by-step instructions to connect Heap and Redshift
  • Detailed info on performance optimization, ssh tunneling, whitelisting IP addresses, and troubleshooting permissions
This doc is for: Admins Architects
View instructions for: 

For access to Heap Connect, contact your Customer Success Manager or sales@heap.io.

Heap Connect lets you directly access your Heap data using SQL in your Redshift warehouse. You can run ad-hoc analyses, connect to BI tools such as Tableau, or join the raw Heap data with your own internal data sources.

Best of all, we automatically keep the data up-to-date and optimize its performance for you. Define an event within the Heap interface, and in just a few hours, you’ll be able to query it retroactively in a clean SQL format.

Note: Currently, we do not sync initial user properties to the users table in Redshift.

Setup

Heap Connect can sync data to Amazon Redshift. To get started, you’ll need to choose one of two options:

  • Heap hosts the Redshift cluster for you. We’ll provision a Redshift cluster, manage the details of configuration ourselves, and provide you with access credentials.
  • Host the Redshift cluster yourself. You’ll need to grant us access to an existing Redshift cluster. Have your credentials ready, which are: the cluster’s hostname, port, username, and password. You’ll also need to whitelist our IP if your cluster isn’t exposed to the public.

In both scenarios, you need to be careful that any changes you make to the cluster do not interfere with Heap’s dynamic schema updates.

Connection Requirements

To connect Heap to Redshift, proceed as follows:

1. Create Heap User

In order for Heap to sync to your Redshift cluster, you’ll need to create a user for Heap within the database of choice who at a minimum has CREATE privileges. If possible, we do prefer a superuser, as it will help with performance optimization.

We require access to system pg_table_def. Optimally, we’d also like to have access to stv_partitions to check the remaining disk size (although this is not required).

2. Update Table Permissions

Once the Heap user is created, you’ll need to update permissions for users that access your Redshift cluster by updating the default privileges for tables created by our Heap user in future syncs. Please run the following command as any superuser after creating the Heap user: ALTER DEFAULT PRIVILEGES FOR USER <heap_username> GRANT SELECT ON TABLES TO <target_user/group>

Please note that you’ll need to specify any other permissions you’d like to provide your target users or target group if users require other permissions beyond SELECT for the Heap data.

3. Whitelist Heap IP Address

Additionally, you’ll need to make sure that you’ve whitelisted connections from the following IP address: 52.20.48.42

4. Fill in your credentials on the Redshift setup page in Heap

If you do not see the setup page, please reach out to support@heap.io. You need to know the following information about your Redshift cluster. This information is available on the Redshift configuration page:

  • Endpoint (host:port format, port is generally 5439)
  • Database Name
  • Username
  • Password
  • Database Name
  • Schema Name (default to “main_production”)
The Redshift setup page in Heap

You can learn more about how the data will be structured upon sync by viewing our docs on data syncing.

Note: It may take up to 24 hours for us to prepare your account for syncing. This is noted in the ‘sync pending’ state.

Performance Optimization

Under the hood, we tune your Redshift schema to ensure that it is as performant as possible. In particular, this means:

  • The sort key on each table is the time field. This ensures that recent data gets accessed more quickly than older data.
  • The distribution key on each table is the user_id field. This collocates user and event data for a given user_id on the same node slice, which ensures that user level joins between the users table and event tables are as performant as possible.
  • Automatic compression is enabled. This ensures that columnar data is optimally encoded, which reduces disk I/O and improves query performance.

To learn more about concepts such as column compression and node distribution, take a look at the Amazon Redshift docs.

SSH Tunneling

For the strongest encryption between Heap and your database, you can create a SSH tunnel to a tunnel server to connect Heap to your Redshift warehouse.

A diagram showing how data gets sent from the Heap network through the public internet to a private network/restricted network

Whitelist IP Address

You’ll need to make sure that you’ve whitelisted connections from the following IP address: 52.20.48.42

Prepare Tunnel Host

You will need to prepare your host by creating a heap user and adding the Heap public key (see below) to the heap .ssh/authorized_keys file, as follows:

  1. Create heap group: sudo groupadd heap
  2. Create heap user and its home directory: sudo useradd -m -g heap heap
  3. Switch to heap user: sudo su - heap
  4. Create the .ssh directory: mkdir ~/.ssh
  5. Set permissions: chmod 700 ~/.ssh
  6. Change to the .ssh directory: cd ~/.ssh
  7. Create the authorized_keys file: touch authorized_keys
  8. Set permissions: chmod 600 authorized_keys

Heap Public Key

The following key should be used as the Heap public key:

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDEe2sdAe1c+6fnoVMkbLCnJY56IPUZMsbTnch5MmmwAOwr/DDMaamQDhXNoZMl5OwF4dTSoMpG5z8qwGfrH5AGINfSaUvGyIRCNHatTYHd6pc81+ztLYkk82/3Z2sgrrfV/k3v+Yt/NMy0Esdo+x+kTr05XhvhbQ02BEm2TXwmLzKcNbQfkO/X16npkrQmAu0kqpo438hQgT/gmQv0KDGIgtUwfur7QmqrgwlaAUY7Y/zHJtFio1nglxztcsjT/iMTxhYzXaGR8yxdqmsJf21oscfQy7rvVJU7ePImvPgXfhG+zZTcdIGh+673BIP7Av6XMc0Mb15lMGrhq884SdDp support@heap.io

Using your favorite text editor, add the ssh key provided by Heap to authorized_keys file. The key must be all on one line.

Provide SSH Tunnel Metadata

Along with the basic configuration details (see below), you should provide to Heap the following information about the SSH tunnel configuration:

  • IP address or DNS name of the tunnel server as seen from the public internet
  • SSH port of the tunnel server as seen from the public internet
  • Username on the tunnel server for the SSH connection (the standard is ‘heap’)

Database Configuration

Once completed, you’ll also need to ensure that you follow the database configuration steps and provide the requested metadata in our Connection Requirements article.

Redshift Permissions Troubleshooting

If you didn’t apply default privileges as part of your Connection Requirements, some users may lose permissions on some tables when:

  1. An event definition changes (any change except the name)
  2. A table sync is toggled off and on again (in the Event view or the Heap Connect view)
  3. Weekly maintenance tasks are run (VACUUM, etc)

Redshift doesn’t currently provide an easy way to re-apply table-level permissions in these cases, so we recommend applying schema-level permissions as part of your setup.

Set Permissions For All Users

Typically, users will have one of two permission sets: ALL and SELECT. You can read more about Redshift permissions in Redshift’s documentation.

To correctly set ALL permissions for your users, run the following three commands on your cluster in the database containing Heap’s schema as a superuser:

GRANT ALL ON SCHEMA {schema_name} TO {comma-separated list of users};
GRANT ALL on ALL TABLES IN SCHEMA {schema_name} TO {comma-separated list of users};
ALTER DEFAULT PRIVILEGES FOR USER {heap_user} GRANT ALL ON TABLES TO {comma-separated list of users};

To correctly set SELECT permissions for your users, run the following three commands on your cluster in the database containing Heap’s schema:

GRANT USAGE ON SCHEMA {schema_name} TO {comma-separated list of users};
GRANT SELECT on ALL TABLES IN SCHEMA {schema_name} TO {comma-separated list of users};
ALTER DEFAULT PRIVILEGES FOR USER {heap_user} GRANT SELECT ON TABLES TO {comma-separated list of users};

This should ensure that the correct permissions persist when we drop and recreate tables. Please contact support@heap.io if you have any questions.

Was this article helpful?

Yes No

Thank you for your feedback!

Last updated March 1, 2021.

integrationredshiftredshift destination
  • Blog
  • Partners
  • Security
  • Terms
  • About
  • Careers
  • Privacy
  • Contact Us

© 2021 Heap, Inc.