Data Discrepancy: Trouble Shooting

It can happen that two systems for tracking and reporting show different numbers, e.g. for the same time range.

Example: The Ingenious system and an internal reporting tool (like Google Analytics) show a different amount of sales for the channel Affiliate.

In this case it may be helpful to systematically analyze where the discrepancy comes from. Here the most common and important questions to ask and analyze:

Which data? The first question is obvious, but worth double checking: Should and could the data in both systems match? Are we looking at the same metrics and sources?

Compare data: Target and actual values

If two systems measure and process different data results, the recommended analysis is to compare raw data (or data that is close enough). This means to look at single clicks or single conversions for a given time period. Often 1-3 days of data helps to run a good analysis.

Solution: Get as detailed as possible

Looking at single conversions or single clicks, missing data points (like missing conversions or missing clicks) can be identified. Once this is done, one can search for details and patterns that the missing data point may have in common.

Data sources, for example reports or exports, that are frequently used for finding discreapancies:

  • Touchpoints export: Contains details of the winning touchpoint(s) for each conversion (Finance > Conversions)

  • Conversions export: Contains details of each conversion (Finance > Conversions)

  • Data Warehouse: This is where you can access all data, Ingenious has. Usually this is done via BigQuery and SQL request. Useful if more details than in the reports above is needed. read more here

Things to look out for in data

There can be an infinite number of reasons for data discrepancies. However, experience shows, it is worth to take a closer look at some usual suspects in the data, like:

  • Time zone

  • Click filters

  • Bots and Bot filters (Ingenious filters bot traffic automatically)

  • Browser information (e.g. one bowser is missing or underrepresented in the data)

  • Mobile traffic

  • IP Adresses (truncated): Sometimes one IP adress or IP range may be suspicious

  • Referer data

  • own data that may lead to hints, e.g. customer ids

  • Traffic from testing systems/ staging

 

Experience shows: Often data is missing, because one system tracks better than the other.

Example: Ingenious first party tracking is almost always more precise than Google Analytics: Often, Ingenious collects between 30 and 70% more. Reason is, that Google Analytics is often blocked by browsers and ad blockers.

 

So the research task is to find reasons for missing data. If the search for patterns within the data does not help, it makes sense to take a look at where the data comes from: 

 

Where does the data come from?

How does a system collect data? What is the original source? Possibilities are for example:

  • collected in a browser (e.g. via javascript, cookies needed etc)

  • send by a another server

  • fetched via API

 

Tracking systems in most cases collect data in a browser. However in these cases, tracking quality can vary heavily depending on

  • is first or 3rd party tracking implemented? (domain of the tracking server as well as domain for the cookies). 3rd party tracking can mean heavy data losses due to browser regulations

  • is the tracking domain well known (and hence likely to be recognized by ad blockers etc?)

  • is user consent required before the tracking script can be executed?

  • are cookieless technologies used? (click IDs, server side tracking and others?)

  • For conversion tracking: Is a tracking switch used? Which rules are applied to execute a tracking tag?

 

How is data processed and aggregated?

The way data is aggregated has a significant impact on the results like the reports. Example: Which time zone is used?

Also important: Processing and applied rules. Example: Attribution rules: The most famous one would be last touchpoint wins, however this often leads to stupid results since it may prioritize owned and earned channels over paid ones or ignore previous touchpoints in customer journeys.

 

The easiest way to analyze is data that is as raw as possible, for example single conversions or single clicks.