Data in Heap may differ from data seen in other tools for a variety of reasons. Generally speaking, discrepancies of up to 10% may be normal and differences can be expected based on the differences between tools alone.
Known Differences by Platform
One of the most common causes of data discrepancies between Heap and other tools is when Heap is installed via a tag manager, like Google Tag Manager, or when it is installed directly in the body of your pages.
When Heap is installed in the body, it has to wait for the entire page and its contents to load before Heap even begins to load. This process often takes a few seconds, and during that time, the user who loaded the page may have already clicked through one of your links, or clicked the back button in their browser.
For the most accurate data collection, make sure to install Heap directly in the head element of your pages.
If you’ve noticed that a Referrer or Initial Referrer in Heap displays your own domain, you’ve run into a self referral. For details on why this may appear, see Why do I see referrals from my own domain in Heap?
Form Submissions vs Internal Databases
Heap captures clicks, form changes, and form submissions automatically. This means that we hook into the events and try to preempt any other events as we capture data. If your form, like most forms, has form validation, then Heap will still capture a form submit event even if your own form validation signals that the form is incomplete.
If you want to compare internal data to Heap data for form submissions, there are a few recommended approaches.
The best approach which requires the least amount of effort is to look at the post-submit URL. For most shopping cart purchases, the user lands on a confirmation or thank you page. For sign-ups or logins, they land on a profile or signup page. If your application has this flow, then using those pages as the event of comparison will likely produce better results.
To ensure that successful form submissions are tracked only when the form is completed, you can hook Heap’s custom tracking API directly into your existing form validation code.
While Heap and Google Analytics are both client-side tracking libraries, they operate on fundamentally different data models, which can produce discrepancies in the results between platforms. Discrepancies of up to 5-15% are considered within the range of an expected discrepancy across Google Analytics and Heap.
Different Data Models Lead To Different Session Counts
It’s not uncommon for Google Analytics to report a much higher session count than Heap, despite the definition of a session being very similar between both tools.
Heap’s data model is user-centric, though Google Analytics’ data model is session-based. A session in Heap is defined as a period of activity from a single user on your website, and ends after 30 minutes of pageview inactivity. The same definition applies for Google Analytics, though they will count additional sessions for clicks on AdWords campaigns.
Additionally, a session that crosses midnight will automatically count as two sessions in Google Analytics and one session in Heap.
You may see a difference in the attribution metrics in Heap and Google Analytics. Google Analytics’ model is built from a complex attribution-processing chart. One of the methods that is applied to processing is source precedence, which is a Google Analytics-specific attribution concept that does not exist in Heap.
Source precedence means any direct-traffic visit that follows a paid-referred visit (within 3-6 months) will be attributed to an existing paid campaign. Whatever is the latest paid campaign visit is listed as the referral for the direct-traffic visit in Google Analytics.
Heap captures UTM parameters for a session if they are present in the session’s landing page URL. Heap captures the referrer based on the value of `document.referrer` of the session’s landing page. This is the URL of the page that comes directly before the session’s landing page.
If you are looking to set up a marketing channel property to measure attribution, note that in Heap, marketing channels must be created as a defined property to determine the “Marketing Channel” in Heap. With defined properties in Heap, the conditions are evaluated in the order they are defined. It’s best practice to place cases with more general conditions at the bottom or below cases with more specific conditions to prevent any misclassifications.
Bots & Scrapers
Google Analytics, by default, will not filter any traffic, including low quality traffic like scrapers and bots. Heap automatically filters many different scrapers and bots.
You can block some bots in Google Analytics by going to Admin > View Settings, and checking Exclude all hits from known bots and spiders.
This won’t exclude bot data that’s already been collected, and since Google doesn’t necessarily block the same bots and scrapers that Heap blocks, there is a chance that the session counts will still differ.
Clicking on any Google AdWords link that leads to your pages generates a new session, even if the visitor had a pre-existing session.
For example, clicking through an AdWords link, then clicking back, and clicking the same AdWords link again will create two sessions. This doesn’t usually create a large disparity between Heap and Google Analytics, but if a large fraction of your traffic lands on your site via AdWords, it may become noticeable.
We don’t capture Adwords auto-tagging. While Heap captures the gclid parameters if they are present in the URL, we can’t derive any meaning from the gclid parameter. This is because Google sets the gclid parameter for their own use and doesn’t share any information about it externally. Heap may also lose the “gclid” due to page redirects that will clear the gclid from the URL before Heap has a chance to load.
To track Google Ads or Adwords campaigns in Heap, we recommend using manual UTM tagging to set the source, campaign, etc. for landing pages. Heap automatically captures UTM parameters for each session by pulling the UTM query parameters from the session’s landing page URL. See Marketing: Adwords: Manual & Auto-Tagging Best Practices for information on how to set this up.
Google Analytics Properties
If you’re looking at the session counts in Google Analytics, it’s important to make sure that you’re looking at All Web Site Data. If you’ve got Google Analytics installed on multiple parts of your site, and you’re using multiple Properties within Google Analytics, looking at just one of them may be excluding session data from different parts of your site.
It’s important to make sure that you’ve got Heap and Google Analytics installed on all the same places before attempting to compare data between them. If you have a blog, marketing pages, and a web application, and Heap and Google Analytics are not on all the same parts, reported numbers are very unlikely to match up.
If you’ve built a funnel in Heap and you’re comparing the results to a Conversion Goal in Google Analytics, you’ll find that they often differ significantly.
Heap’s funnels count the number of unique users who complete each step of the funnel in the time range selected, which is the past week, by default.
Goal Conversions in Google Analytics count the number of sessions in which the goal steps were completed. These steps must be completed in the same session, and a single unique user could complete these goal steps multiple times.
There is not a good way to match up Heap’s funnel conversion rates to Google’s Goal Conversions.
There is no chart in Heap that will give the same results as the unique pageviews report in Google Analytics.
In Google’s own words: Unique Pageviews is the number of sessions during which the specified page was viewed at least once. A unique pageview is counted for each page URL + page Title combination.
There are two reasons why this report doesn’t map to a report in Heap. First, it’s measuring the count of sessions, not pageviews, and second, a pageview in Google Analytics is a URL + Title combination. In Heap, a pageview is a page load, or for single-page applications, a pushState event that changes the URL.
If your Google Analytics reports are sampled, even for a report as simple as total sessions, these reports will likely not match your Heap data. In general, the shape of the curve is usually pretty close, but the final counts will be off by a percentage that scales with the extent of the sampling being done. The discrepancy will grow for any report that drills deeper into the data, such as funnels and filtered reports.
New Users in Heap is a built-in segment. When charted, it shows the count of users that first had a Heap cookie set on a particular day. Unless you installed Google Analytics on the same day as Heap, these counts will rarely map directly to a report in Google Analytics. The default new vs. return users report in Google Analytics also only shows the percentage of sessions, not the number of unique users.
To learn how to map new vs. returning users in Heap, see How do I analyze my new vs. returning users?
Single Page Applications
Heap will record a higher count of pageviews than Google Analytics in single page applications. This is because Heap captures URL changes and/or pushState changes as new pageviews, and Google Analytics does not. For more details, see Google Analytics developer documentation on Single Page Application Tracking.
When comparing a retention report in Mixpanel to one in Heap, the values are likely to be very different, with the Mixpanel report typically inflating both the total number of users in a cohort, as well as the retention percentages.
Their documentation on this states the following:
A customer can only be counted once per bucket, but can be included in more than one bucket. For example, if you are bucketing based on your “item purchased” event and creating weekly buckets, a customer who purchased at least one item each week will be in every bucket, not just the bucket for their first purchase.
In Heap, a user is never included in more than one cohort. Since Mixpanel buckets a single user into multiple cohorts, this means that user counts in each cohort may be inflated, and the retention percentage for each cohort can also be inflated.
Discrepancy settings checklist
Checking the following settings across the reports where you are seeing a discrepancy.
- Date ranges and time zones
- Comparison of equal installations (ex. the correct Heap environment to the correct equivalent Google Analytics property)
- Google Analytics reports are not sampled
- Any IP & Bot filtering settings
- Make sure you are comparing the same metric (ex: sessions vs sessions)
Check some simple reports
If double-checking settings doesn’t address your issue, we recommend setting up two similar reports on both platforms to compare data.
Pick the main metric of concern (users, sessions, or pageviews) and set up a usage over time chart of a basic count over a sample date range. Refrain from filtering, and check to see if there is a difference of more than 15% difference between the total counts.
If there is a different greater than 15%, group both reports by a single property common to both platforms, such as:
|Metric||Properties to group by|
|Users||Location, IP address|
|Sessions||Landing page, IP address, device type|
|Pageviews||Domain, path, device type|
Use only one property at a time to keep it simple. Try to avoid using properties with a ton of possible values.
If the difference appears to be isolated only to a specific grouping, investigate internally why that variance is occurring.
Bot Traffic Considerations
When comparing reports, you may see unusual spikes in traffic from Heap that inflate overall counts. Some common patterns that suggest bot traffic capture in Heap are:
- A high spike in users and/or sessions with the same IP address on a specific day or week. If this is the case, Try looking up this IP to see if it’s tied to known bot traffic.
- Data from Ashburn, VA
- A group of users with identical session behavior (ex. a session with the same 5 events)
- Traffic from very outdated Browser versions
Some important information about Heap’s functionality related to bot filtering:
- By default, Heap does basic bot filtering based on a list of userAgents known to be bots.
- In the Heap app, it’s possible to block future capture from the same bot source by adding these IP addresses to the excluded IP address settings section. To learn how, see How do I get Heap to ignore data from certain IP addresses?
- To prevent and manage any future bot attacks, we recommend using a dedicated bot filtering tool.
If none of the above helps, or if you’re unsure whether or not the cause is bot traffic, contact us via the Get support page with the following information:
- Links to your report in Heap (created in the section above) that show the discrepancy.
- Screenshots of the full report you are comparing this data to in Google Analytics or another tool along with the results.
- Confirm whether the issue is related to user, event, or session counts, or to attribution properties.
- Confirm which platform shows higher numbers.
- Confirm if this discrepancy has been consistent over time (ex. 3-6 months) or has just shown up recently.