Regex (short for regular expression) is a mini programming language used to define a search pattern. Learn more about it at RegexOne and play around with formulas at Regular Expressions 101.
Formulas are extremely customizable based on your app’s data structure and analysis needs. Here are a couple of broadly applicable regex examples for capturing these properties:
- Extract a query string value given a key
- Extract a substring from a path
- Extract a custom attribute from the hierarchy
- Remove trailing slashes from a URL path
Review the use cases below for additional examples of how to set up your own formulas.
Heap uses PostgreSQL’s regex engine, which handles certain types of formatting (ex. hyphens) differently than other regex engines. For more information on how to format regex for PostgreSQL’s regex engine, see their official documentation.
Extracting non-UTM parameters from a query string
You may want to capture non-UTM parameters directly from a URL, such as capturing the channel from ?source=CHANNEL.
You can set this up using the Extract Using Regular Expression function. In this example, it is: [?&]source=([^&]*)
Extracting the product ID of a URL
You may want to grab the product ID from the URL to analyze the popularity of your different product pages.
To do so, use the Extract Using Regular Expression function and define the correct regex to apply to your URL. In this example, it is: \/product\/(\d+)
Note that the regular expression you need to define will vary based on the path of your URL.
Extracting custom attributes from the hierarchy
You may need to extract a piece of information from the hierarchy to capture in its own property, such as the value of a data attribute like [data-testid=value]
.
To do so, select Extract Using Regular Expression as the function and define the regex to extract the attribute value from the hierarchy. You can find an example regular expression here.
Remove trailing slashes in reports when grouping by path
When grouping by path, paths that are semantically the same are sometimes split in the results, such as /test/blog-post-123/
and /test/blog-post-123
, where the only difference is the presence of the slash at the end. You can set up a formula using the following regex: (\/.*?)\/?$
to remove the trailing slash and thus ensure this data is reconciled into consistent properties.