Skip to content
  • Home
  • Developers
  • API
  • Releases
  • Community
  • University
  • Status
  • Home
  • Developers
  • API
  • Releases
  • Community
  • University
  • Status
Home Heap Connect Heap Connect FAQs How can I improve Redshift query performance?
Session Replay Getting Started Administration Data Management Definitions Charts Analysis Examples Heap Plays Integrations Heap Connect Data Privacy

Table of Contents

Was this article helpful?

Yes No

Thank you for your feedback!

How can I improve Redshift query performance?

This doc is for: Admins Architects

There are several common reasons why you may see your Redshift queries running slowly. Review this guide for information on how to improve Redshift query performance to speed up your queries. If you are still seeing performance issues after applying the tips in this guide, contact us via the Get support page.

Check to see if you’re running out of space

The query below will give you the percentage of storage used in your cluster, which you can use to check if this is the reason for slow query performance. If you’re nearing capacity (80% or greater), consider adding some more nodes or truncating some unused historical data.

SELECT sum(pct_used)
FROM svv_table_info;

Write specific, well-structured queries

Writing well-structured queries that are as specific as possible will allow you to get results as quickly as possible. As an example, you might run this query to search for a certain set of data for ALL time (3 years).

SELECT * 
FROM table;

In comparison, if you were to run this query to search for this same data over only the last month, the results will appear faster.

SELECT * 
FROM table 
WHERE time > '2020-10-01'

As another example, if you select for every column (ex. 100 columns) of data on the table, the query will run significantly slower than one for just your necessary columns (ex. 2) such as the query below.

SELECT name, email
FROM table

Depending on your desired results, a combination of the two would be most effective.

SELECT name, email 
FROM table 
WHERE time > '2020-10-01'

When running JOIN clauses, you should always try (if possible) to incorporate the table’s `distkey` (a property specific to Redshift, and always the ‘user_id’ column in Heap schema) in the JOIN clause. For example:

SELECT ...
FROM <schema>.<table_name>
LEFT JOIN <schema>.<other_table_name> 
ON <table_name>.session_id = <other_table_name>.session_id

The example above does not use the distkey “user_id” in the JOIN clause. Here’s a faster version that uses the distkey in the JOIN clause.

SELECT ...
FROM <schema>.<table_name>
LEFT JOIN <schema>.<other_table_name> USING (user_id, session_id)

Stagger ETL processes to avoid overlap

If you have multiple ETL processes loading into your warehouse at the same time, everything will slow down. Try to schedule them at different times when your cluster is least active.

For more tips, see the following Amazon docs on designing Redshift queries:

  • Amazon Redshift best practices for designing queries
  • Top 10 performance tuning techniques for Amazon Redshift

To learn more about writing SQL, check out these resources:

  • Mode on Performance Tuning SQL Queries
  • Chartio on Improving Query Performance

Was this article helpful?

Yes No

Thank you for your feedback!

Last updated September 7, 2022.

queries slowredshift slowSpeed up redshiftspeed up redshift queries
  • Blog
  • Partners
  • Legal
  • Security
  • Terms
  • About
  • Careers
  • Privacy
  • Contact Us

© 2023 Heap, Inc.