Skip to content
  • Home
  • Developers
  • API
  • Releases
  • Community
  • University
  • Status
  • Home
  • Developers
  • API
  • Releases
  • Community
  • University
  • Status
Home Heap Connect Data Warehouses Redshift Integration
Session Replay Getting Started Administration Data Management Definitions Charts Analysis Examples Heap Plays Integrations Heap Connect Data Privacy

Table of Contents

Was this article helpful?

Yes No

Thank you for your feedback!

Redshift Integration

In this article you'll learn:

  • Step-by-step instructions to connect Heap and Redshift
  • Detailed info on performance optimization, ssh tunneling, allowlisting IP addresses, and troubleshooting permissions
This doc is for: Admins Architects

For access to Heap Connect, contact your Customer Success Manager or sales@heap.io.

Heap Connect lets you directly access your Heap data using SQL in your Redshift warehouse. You can run ad-hoc analyses, connect to BI tools such as Tableau, or join the raw Heap data with your own internal data sources.

Best of all, we automatically keep the data up-to-date and optimize its performance for you. Define an event within the Heap interface, and in just a few hours, you’ll be able to query it retroactively in a clean SQL format.

Note: Currently, we do not sync initial user properties to the users table in Redshift.

Setup

To connect Heap to Redshift, proceed as follows:

1. Create Heap User

In order for Heap to sync to your Redshift cluster, you’ll need to create a user for Heap within the database of choice who at a minimum has CREATE and USAGE privileges. If possible, we do prefer a superuser, as it will help with performance optimization.

We recommend the following cluster settings based on app traffic.

Monthly Pageviews
(estimating from sessions = # of monthly sessions x 10)
Recommended minimum Redshift cluster sizeMaximum # of tables to reliably load in under 4 hours
300-600M64 x dc2.large nodes20
150-300M32 x dc2.large nodes30
< 75M4 x dc2.large nodes35
75-150M8 x dc2.large nodes35

We require access to system pg_table_def. Optimally, we’d also like to have access to stv_partitions to check the remaining disk size (although this is not required).

2. Update Table Permissions

Once the Heap user is created, you’ll need to update permissions for users that access your Redshift cluster by updating the default privileges for tables created by our Heap user in future syncs. Please run the following command as any superuser after creating the Heap user: ALTER DEFAULT PRIVILEGES FOR USER <heap_username> GRANT SELECT ON TABLES TO <target_user/group>

Please note that you’ll need to specify any other permissions you’d like to provide your target users or target group if users require other permissions beyond SELECT for the Heap data.

3. Allowlist Heap IP Address

Additionally, you’ll need to make sure that you’ve allowlisted connections from the following IP address: 52.20.48.42

4. Fill in your credentials on the Redshift setup page in Heap

If you have been granted access to Heap Connect and you do not see the setup page, please reach out to support@heap.io. You need to know the following information about your Redshift cluster. This information is available on the Redshift configuration page:

  • Endpoint (host:port format, port is generally 5439)
  • Username
  • Password
  • Database Name
  • Schema Name (default to “main_production”)

You can learn more about how the data will be structured upon sync by viewing our docs on data syncing.

Note: It may take up to 24 hours for us to prepare your account for syncing. This is noted in the ‘sync pending’ state.

Note: The schema name defaults to snake-cased _. The name can be changed, but do not specify an existing schema in your database. The Heap schema must be a separate, new schema, that will be created upon initial sync, and no tables or views can be manually added to the Heap schema.

You can sync Defined Properties to this warehouse.

What to expect from your first sync

Once you have successfully configured Redshift, you can see your sync status on the Redshift page in Heap.

How fast you can expect the first sync to occur

The first sync can take up to 48 hours, depending on how much data you have. Subsequent syncs happen every 24 hours. Click the Configuration (gear) icon on the Redshift page to select a sync start time.

Reach out to support@heap.io if you have any questions or concerns about your first sync.

What tables you will see downstream

The Redshift connection syncs the Heap data to your downstream connection. By default it syncs 3 data tables (pageviews, users, sessions) and some additional metadata tables like user_migrations and sync_info tables.

How to sync more events downstream

  1. Go to Defintions > Events and click on the event you want to sync. Under Connect select the toggle for Redshift

2. On the pop-up that appears, select either All Time (since the day Heap was installed) or Date To Now. If you click Date To Now, a small calendar will appear where you can select a starting date.

3. The toggle you selected will now be green and read “Pending initial sync to Redshift” when you hover over it. The sync for this event doesn’t happen immediately; it will happen at your next scheduled sync.

Performance Optimization

Under the hood, we tune your Redshift schema to ensure that it is as performant as possible. In particular, this means:

  • The sort key on each table is the time field. This ensures that recent data gets accessed more quickly than older data.
  • The distribution key on each table is the user_id field. This collocates user and event data for a given user_id on the same node slice, which ensures that user level joins between the users table and event tables are as performant as possible.
  • Automatic compression is enabled. This ensures that columnar data is optimally encoded, which reduces disk I/O and improves query performance.

To learn more about concepts such as column compression and node distribution, take a look at the Amazon Redshift docs.

SSH Tunneling

For the strongest encryption between Heap and your database, you can create a SSH tunnel to a tunnel server to connect Heap to your Redshift warehouse.

A diagram showing how data gets sent from the Heap network through the public internet to a private network/restricted network

Allowlist IP Address

You’ll need to make sure that you’ve allowlisted connections from the following IP address: 52.20.48.42

Prepare Tunnel Host

You will need to prepare your host by creating a heap user and adding the Heap public key (see below) to the heap .ssh/authorized_keys file, as follows:

  1. Create heap group: sudo groupadd heap
  2. Create heap user and its home directory: sudo useradd -m -g heap heap
  3. Switch to heap user: sudo su - heap
  4. Create the .ssh directory: mkdir ~/.ssh
  5. Set permissions: chmod 700 ~/.ssh
  6. Change to the .ssh directory: cd ~/.ssh
  7. Create the authorized_keys file: touch authorized_keys
  8. Set permissions: chmod 600 authorized_keys

Heap Public Key

The following key should be used as the Heap public key:

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDEe2sdAe1c+6fnoVMkbLCnJY56IPUZMsbTnch5MmmwAOwr/DDMaamQDhXNoZMl5OwF4dTSoMpG5z8qwGfrH5AGINfSaUvGyIRCNHatTYHd6pc81+ztLYkk82/3Z2sgrrfV/k3v+Yt/NMy0Esdo+x+kTr05XhvhbQ02BEm2TXwmLzKcNbQfkO/X16npkrQmAu0kqpo438hQgT/gmQv0KDGIgtUwfur7QmqrgwlaAUY7Y/zHJtFio1nglxztcsjT/iMTxhYzXaGR8yxdqmsJf21oscfQy7rvVJU7ePImvPgXfhG+zZTcdIGh+673BIP7Av6XMc0Mb15lMGrhq884SdDp support@heap.io

Using your favorite text editor, add the ssh key provided by Heap to authorized_keys file. The key must be all on one line.

Provide SSH Tunnel Metadata

Along with the basic configuration details (see below), you should provide to Heap the following information about the SSH tunnel configuration:

  • IP address or DNS name of the tunnel server as seen from the public internet
  • SSH port of the tunnel server as seen from the public internet
  • Username on the tunnel server for the SSH connection (the standard is ‘heap’)

Database Configuration

Once completed, you’ll also need to ensure that you follow the database configuration steps and provide the requested metadata in our Connection Requirements article.

Redshift Permissions Troubleshooting

If you didn’t apply default privileges as part of your Connection Requirements, some users may lose permissions on some tables when:

  1. An event definition changes (any change except the name)
  2. A table sync is toggled off and on again (in the Event view or the Heap Connect view)
  3. Weekly maintenance tasks are run (VACUUM, etc)

Redshift doesn’t currently provide an easy way to re-apply table-level permissions in these cases, so we recommend applying schema-level permissions as part of your setup.

Set Permissions For All Users

Typically, users will have one of two permission sets: ALL and SELECT. You can read more about Redshift permissions in Redshift’s documentation.

To correctly set ALL permissions for your users, run the following three commands on your cluster in the database containing Heap’s schema as a superuser:

GRANT ALL ON SCHEMA {schema_name} TO {comma-separated list of users};
GRANT ALL on ALL TABLES IN SCHEMA {schema_name} TO {comma-separated list of users};
ALTER DEFAULT PRIVILEGES FOR USER {heap_user} GRANT ALL ON TABLES TO {comma-separated list of users};

To correctly set SELECT permissions for your users, run the following three commands on your cluster in the database containing Heap’s schema:

GRANT USAGE ON SCHEMA {schema_name} TO {comma-separated list of users};
GRANT SELECT on ALL TABLES IN SCHEMA {schema_name} TO {comma-separated list of users};
ALTER DEFAULT PRIVILEGES FOR USER {heap_user} GRANT SELECT ON TABLES TO {comma-separated list of users};

This should ensure that the correct permissions persist when we drop and recreate tables. Please contact support@heap.io if you have any questions.

Was this article helpful?

Yes No

Thank you for your feedback!

Last updated December 22, 2022.

integrationredshiftredshift destination
  • Blog
  • Partners
  • Legal
  • Security
  • Terms
  • About
  • Careers
  • Privacy
  • Contact Us

© 2023 Heap, Inc.