Heap Connect lets you directly access your Heap data using SQL in your Redshift warehouse. You can run ad-hoc analyses, connect to BI tools such as Tableau, or join the raw Heap data with your own internal data sources.
Best of all, we automatically keep the data up-to-date and optimize its performance for you. Define an event within the Heap interface, and in just a few hours, you’ll be able to query it retroactively in a clean SQL format.
Note: Currently, we do not sync initial user properties to the users table in Redshift.
Heap Connect can sync data to Amazon Redshift. To get started, you’ll need to choose one of two options:
- Heap hosts the Redshift cluster for you. We’ll provision a Redshift cluster, manage the details of configuration ourselves, and provide you with access credentials.
- Host the Redshift cluster yourself. You’ll need to grant us access to an existing Redshift cluster. Have your credentials ready, which are: the cluster’s hostname, port, username, and password. You’ll also need to whitelist our IP if your cluster isn’t exposed to the public.
In both scenarios, you need to be careful that any changes you make to the cluster do not interfere with Heap’s dynamic schema updates.
To connect Heap to Redshift, proceed as follows:
1. Create Heap User
In order for Heap to sync to your Redshift cluster, you’ll need to create a user for Heap within the database of choice who at a minimum has
CREATE privileges. If possible, we do prefer a superuser, as it will help with performance optimization.
2. Update Table Permissions
Once the Heap user is created, you’ll need to update permissions for users that access your Redshift cluster by updating the default privileges for tables created by our Heap user in future syncs. Please run the following command as any superuser after creating the Heap user:
ALTER DEFAULT PRIVILEGES FOR USER <heap_username> GRANT SELECT ON TABLES TO <target_user/group>
Please note that you’ll need to specify any other permissions you’d like to provide your target users or target group if users require other permissions beyond
SELECT for the Heap data.
3. Whitelist Heap IP Address
Additionally, you’ll need to make sure that you’ve whitelisted connections from the following IP address:
4. Fill in your credentials on the Redshift setup page in Heap
If you do not see the setup page, please reach out to email@example.com. You need to know the following information about your Redshift cluster. This information is available on the Redshift configuration page:
- Endpoint (host:port format, port is generally 5439)
- Database Name
- Schema Name (default to “main_production”)
You can learn more about how the data will be structured upon sync by viewing our docs on data syncing.
Note: It may take up to 24 hours for us to prepare your account for syncing. This is noted in the ‘sync pending’ state.
Under the hood, we tune your Redshift schema to ensure that it is as performant as possible. In particular, this means:
- The sort key on each table is the
timefield. This ensures that recent data gets accessed more quickly than older data.
- The distribution key on each table is the
user_idfield. This collocates user and event data for a given
user_idon the same node slice, which ensures that user level joins between the
userstable and event tables are as performant as possible.
- Automatic compression is enabled. This ensures that columnar data is optimally encoded, which reduces disk I/O and improves query performance.
To learn more about concepts such as column compression and node distribution, take a look at the Amazon Redshift docs.
For the strongest encryption between Heap and your database, you can create a SSH tunnel to a tunnel server to connect Heap to your Redshift warehouse.
Whitelist IP Address
You’ll need to make sure that you’ve whitelisted connections from the following IP address: 18.104.22.168
Prepare Tunnel Host
You will need to prepare your host by creating a heap user and adding the Heap public key (see below) to the heap .ssh/authorized_keys file, as follows:
- Create heap group:
sudo groupadd heap
- Create heap user and its home directory:
sudo useradd -m -g heap heap
- Switch to heap user:
sudo su - heap
- Create the .ssh directory:
- Set permissions:
chmod 700 ~/.ssh
- Change to the .ssh directory:
- Create the authorized_keys file:
- Set permissions:
chmod 600 authorized_keys
Heap Public Key
The following key should be used as the Heap public key:
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDEe2sdAe1c+6fnoVMkbLCnJY56IPUZMsbTnch5MmmwAOwr/DDMaamQDhXNoZMl5OwF4dTSoMpG5z8qwGfrH5AGINfSaUvGyIRCNHatTYHd6pc81+ztLYkk82/3Z2sgrrfV/k3v+Yt/NMy0Esdo+x+kTr05XhvhbQ02BEm2TXwmLzKcNbQfkO/X16npkrQmAu0kqpo438hQgT/gmQv0KDGIgtUwfur7QmqrgwlaAUY7Y/zHJtFio1nglxztcsjT/iMTxhYzXaGR8yxdqmsJf21oscfQy7rvVJU7ePImvPgXfhG+zZTcdIGh+673BIP7Av6XMc0Mb15lMGrhq884SdDp firstname.lastname@example.org
Using your favorite text editor, add the ssh key provided by Heap to authorized_keys file. The key must be all on one line.
Provide SSH Tunnel Metadata
Along with the basic configuration details (see below), you should provide to Heap the following information about the SSH tunnel configuration:
- IP address or DNS name of the tunnel server as seen from the public internet
- SSH port of the tunnel server as seen from the public internet
- Username on the tunnel server for the SSH connection (the standard is ‘heap’)
Once completed, you’ll also need to ensure that you follow the database configuration steps and provide the requested metadata in our Connection Requirements article.