How can I improve Redshift query performance?

There are several common reasons why you may see your Redshift queries running slowly. Review this guide for information on how to improve Redshift query performance to speed up your queries. If you are still seeing performance issues after applying the tips in this guide, contact us via the Get support page.

Check to see if you’re running out of space

The query below will give you the percentage of storage used in your cluster, which you can use to check if this is the reason for slow query performance. If you’re nearing capacity (80% or greater), consider adding some more nodes or truncating some unused historical data.

SELECT sum(pct_used)
FROM svv_table_info;

Write specific, well-structured queries

Writing well-structured queries that are as specific as possible will allow you to get results as quickly as possible. As an example, you might run this query to search for a certain set of data for ALL time (3 years).

SELECT * 
FROM table;

In comparison, if you were to run this query to search for this same data over only the last month, the results will appear faster.

SELECT * 
FROM table 
WHERE time > '2020-10-01'

As another example, if you select for every column (ex. 100 columns) of data on the table, the query will run significantly slower than one for just your necessary columns (ex. 2) such as the query below.

SELECT name, email
FROM table

Depending on your desired results, a combination of the two would be most effective.

SELECT name, email 
FROM table 
WHERE time > '2020-10-01'

When running JOIN clauses, you should always try (if possible) to incorporate the table’s `distkey` (a property specific to Redshift, and always the ‘user_id’ column in Heap schema) in the JOIN clause. For example:

SELECT ...
FROM <schema>.<table_name>
LEFT JOIN <schema>.<other_table_name> 
ON <table_name>.session_id = <other_table_name>.session_id

The example above doesn’t use the distkey “user_id” in the JOIN clause. Here’s a faster version that uses the distkey in the JOIN clause.

SELECT ...
FROM <schema>.<table_name>
LEFT JOIN <schema>.<other_table_name> USING (user_id, session_id)

Stagger ETL processes to avoid overlap

If you have multiple ETL processes loading into your warehouse at the same time, everything will slow down. Try to schedule them at different times when your cluster is least active.

For more tips, see the following Amazon docs on designing Redshift queries:

To learn more about writing SQL, check out these resources:

Table of Contents

Was this article helpful?

How can I improve Redshift query performance?

Check to see if you’re running out of space

Write specific, well-structured queries

Stagger ETL processes to avoid overlap