Thursday, December 12, 2024

Your guide to Google Analytics 4 attribution

Must read


Conversion is usually preceded by several interactions with a website or an app.

Attribution determines the role of each touchpoint in driving conversions and assigns credit for sales to interactions in conversion paths.

Therefore, it’s crucial to understand attribution in Google Analytics 4 (GA4).

(If you are new to attribution, read the Google Analytics help article on attribution first.)

How Google Analytics 4 attribution works

Universal Analytics reports attributed the entire credit for the conversion to the last click. A direct visit was not considered a click, but for the avoidance of doubt, this attribution model was also called the last non-direct click model. Other attribution models were only available in the Model Comparison Tool in the Multi-Channel Funnels (MCF) reports section.

GA4 offers a wider availability of different attribution models, but it depends on the scope of the report – whether it is the user acquisition source, session source or event source. 

In Universal Analytics, the source dimensions had session scope solely. The MCF reports made it possible to analyze the sources of all sessions on the conversion path. The three scopes of source dimension in GA4 (user, session, event) are the most important and fundamental changes in the attribution area.   

This guide will use the term “source” in a broader meaning as any dimension that indicates the origin of a visit (e.g., channel grouping, source, medium, ad content, campaign, ad group, keyword, search term, etc.).

In 2024, Google modified the terminology in Analytics, and what were previously known as conversions are now called key events. The term “conversion” in Google Analytics will be reserved for Google Ads conversions imported from Google Ads.

Session source

Session-scope attribution – unsurprisingly – determines the source of the session. It is used, among others, in the Traffic acquisition reports in the Reports section.

The session source is the source that started the session (e.g., social media referral or organic search result). However, if a direct visit started a session, the session source will be attributed to the source of the previous session (if there was any). 

Quick reminder: A direct visit means that Analytics does not know where the user came from because the click does not pass the referrer, gclid, or UTM parameter.

The session source will be direct only if Analytics cannot see any other source of visit for the given user within the lookback window. The default lookback window in GA4 is 90 days. We will return to the lookback window matter later in this article.

By the way, what is a session?

A Google Analytics session is not the same as a browser session.

In GA4, a session begins when a user visits the website or app and ends after the user’s inactivity for a specified time (30 minutes by default – see this Analytics help article).

Closing the browser window does not end the session. If the browser window is closed, another visit to the website within the time limit will still belong to the same session – unless the browser deletes cookies and browser data after closing the browser window, for example in incognito mode.

If a visit from a new source occurs during a session, a new session will not start, and the source of the current session will remain unchanged.

It does not mean that the visit from the new source is ignored. GA4 records the source of this visit, and the event-scope attribution reports (more on that later in this article) will take into account all sources of all sessions. (See this Analytics help article.)

A new visit during an existing session may happen, for example, if a user returns from a payment gateway or a webmail site after password recovery or registration confirmation. These visits will not artificially inflate the number of sessions. 

Nevertheless, sources of these visits are so-called unwanted referrals and should be excluded. Visits from excluded referrals are reported as direct visits.

In GA4, these visits are de facto ignored because the session source and the session count remain unchanged. The non-direct attribution modeling in GA4 will assign no credit to this (direct) source (as described later in this article).

First user source 

First user source (source of the first visit) is new to GA4. It shows where the user came from to the website or app for the first time.

It is a part of Google’s new approach to measurement in online marketing, which no longer focuses only on the classic ROAS (revenues vs. costs), but also analyzes the CAC vs. LTV (customer acquisition cost vs. lifetime value).

This approach reflects the app logic: we have to acquire the app user first, and after the app is installed, further marketing efforts engage and monetize the user. However, for the web traffic, it also makes more sense. 

The new customer acquisition goal in Google Ads, available in Performance Max campaigns, also represents a similar approach. In this case, the focus is on the first-time buyer, not the first visit. 

In GA4, the first user visit is recorded by the first_visit event for the website or the first_open event for the app. The naming is self-explanatory.

Therefore, the source of the first visit is a user attribute and indicates where this user’s first visit to the website or application came from.

The first visit source is attributed using the last non-direct click model. Of course, this attribution applies only to interactions before the first website visit or the first open of the app (interactions following the first visit or first open are not taken into account).

Once assigned, the source of the first visit remains unchanged – of course, as long as Google Analytics can technically link the user’s activity on the website and in the app with the same user.

The first user source will be reset if the tracking of the user is lost, for example, if the user does not visit the website for a period longer than the Analytics cookie expiration date.

We will return to the Analytics cookie expiration period and other data collection limitations in GA4 later in this article.

Event scope attribution

In GA4, events replaced sessions as the fundament of data collection and reporting. Google Analytics makes it possible to report attribution using a selected attribution model only for key events.

The model is set in the Attribution Settings of the GA4 property. There are several pre-defined models to choose from (see the screen below).

Attribution settingsAttribution settings

The default data-driven model can be changed at any time. This change is retroactive (i.e., it will also change the historical data).

A common belief is that Google Analytics 4 no longer uses the last-click attribution model. But is that the case?

In practice, it applies only to customized reports that use event-scope dimensions and metrics, for example, Medium – Key events.

The default traffic and user acquisition reports use session source and first user source, respectively, and these dimensions use the last click model. It is indicated in the dimension name (e.g., Session – Campaign or First User – Medium).

Remember: source, session source and first user source are three different dimensions where different attribution models apply.

ScopeAttribution ModelWhere available
SessionLast clickE.g., traffic acquisition reports
User (first user source)Last clickE.g., user acquisition report
EventModel set in the GA4 property settings (data-driven by default)E.g., in the Explore section

Attribution settings

The attribution model set in the property settings applies to all reports in the property.

There are several attribution models (described in the earlier mentioned Analytics help article), to choose from. However:

  • All the models do not assign value to direct visits unless there is no other choice because there is no other interaction on the path. In other words, they all use the non-direct principle. 
  • The Ads-preferred models assign the entire value of the key event to Google Ads interactions if they occur in the funnel. There is only one Ads-preferred model available: the last click model. In the absence of Google Ads interactions on the funnel, this model works like a regular last-click model.
  • In addition to clicks, models take into account “engaged views” of YouTube ads, that is, watching the ad for 30 seconds (or until the end if the ad is shorter) and other clicks associated with that ad (see this Google Analytics help article for more details).

Again, a change of the attribution model settings works retroactively (i.e., it applies to the historical data before the change). Saved explorations will be recalculated when viewing them.

Lookback window

Google Analytics property settings determine the length of the lookback window. The lookback window determines how far back in time a touchpoint is eligible for attribution credit. The default lookback window is 90 days, but you can change it to 60 or 30 days.

Attribution settings - Key event look-back windowAttribution settings - Key event look-back window

According to Analytics documentation, the lookback window settings apply to all attribution models and all key event types in Google Analytics 4 (i.e., it also applies to session-level attribution and attribution model comparisons).

The lookback window of the first user source has a separate setting (30 days by default, and it can be changed to 7 days). Are you wondering why it is defined differently? 

Well, first of all, it is worth considering why there is any lookback window for the first visit at all.

Moreover, why are we talking about the first user attribution model, which is always the last (non-direct) click?

After all, GA4 knows the source of the first visit when this visit happens. As it is the first visit, there are no previous visits, and thus no other sources to consider.

So, what is the point of looking deeper in time than the first interaction with a website or app?

Google Analytics 4 is designed to blend data collected by the website’s tracking code with information known by Google about the users, especially if they are logged in to Google services.

For example, Google may know that the user had an engaged interaction with our YouTube ad on a different device before the first visit.

Similarly, the user may use the app for the first time (first_open) during a direct session, but the install itself may result from a mobile app install campaign in Google Ads, clicked a few days earlier. 

Therefore, if the source of the first visit session is unknown (it is a direct visit), Google Analytics may try to assign the source of the first visit to the earlier known interaction if it occurred during the lookback window period.

In other words, GA4 may potentially record ad interactions before the first user visit.

Lookback window changes do not work retroactively. It means that they only apply from the moment of the change.

The engaged views of YouTube ads, however, always have three days lookback window, regardless of the property settings.

Get the daily newsletter search marketers rely on.


Universal Analytics’s default lookback window for the acquisition reports was six months. Any change to this period was also non-retroactive. 

Such a change, however, did not apply to conversions (now key events) but to interactions that had taken place after the change. It reflected the logic of the _utmz cookie, which was responsible for storing the source information.

Its expiration time was set when the cookie was created or updated (i.e., upon a visit from a given source).

For example, changing the lookback window in Universal Analytics from 30 to 90 days did not immediately include interactions from 90 days ago in the acquisition reports for the visits since the date of the change because the virtual “source cookie” for interactions older than 30 days has already “expired.”

There was a transition period (in this example, 90 days), after which all key events were fully reported under the new lookback window. 

Google Analytics 4 uses a different data model. They could therefore break with this past and stop using the cookie logic.

For example, they could apply changes to all key events that have taken place since the change, as it is now in Google Ads. Interpreting such would be much easier. They could, but they did not. 

In GA4, the change applies to interactions still in the lookback window. 

For example, if the lookback window is increased from 30 to 90 days, the key events will not immediately be reported in the new, 90 days lookback window. It will be reflected in the reports after 60 days from the date of change (the interactions from the initial 30-day lookback window will be remembered).

Reducing the lookback window (e.g., from 90 to 30 days) will apply the change immediately (i.e., all key events will be reported in the shorter, 30 days window). 

Yes, it sounds exotic. Fortunately, in practice, the analysts do not change the lookback window often. 

The Google Analytics 4 cookie has a standard expiration time of 24 months, but it can be changed to a period between one hour and 25 months (or the cookie may be set as a session cookie and expire after the browser session end).

Subsequent visits may renew this time limit. This will be the period in which Analytics will be able to recognize a returning user and remember the source of the first visit – see this GA4 help article).

However, it does not automatically mean that GA4 will “remember” user data that long.

In addition to the cookie expiration, we also have to deal with the GA4 data retention period. It is set by default to only two months, but you can (and basically, you should) change this setting to 14 months. (In the paid version, Google Analytics 360, it can be up to 50 months.)

After this time, Google deletes user-level data from Analytics servers. To keep this data, you must export it to BigQuery (see this GA4 help article).

It means that reports in the Explore section can only be made within the data retention period (please note that in the Explore section, you cannot select a date range beyond this period).

These restrictions do not apply to standard reports in the Reports section that use aggregated data. GA4 will store this data “forever.” 

In the unpaid version of GA4, the first user source data are deleted after 14 months of inactivity. After that, this user will be recorded as a new user.

Therefore, there is no point in, for example, changing the cookie expiration time from default 24 months to a longer period, unless you use Google Analytics 360. 

Conversion export to Google Ads

Exporting conversions to Google Ads is often used as an alternative to the native Google Ads conversion tracking as the fastest and most convenient way to implement conversion tracking in Google Ads. 

However, this time-saving seems illusory in the era of Google Tag Manager. 

In GA4, the conversion import has flexible options so it is important to understand the differences between available settings. 

In Universal Analytics and the earlier versions of GA4, the conversions were solely exported using Analytics’ last-click attribution model, regardless of the attribution model selected in Google Ads. 

This methodology had problematic implications, particularly if the imported conversions were to be used for Google Ads optimization: 

  • It reduced the number of conversions observed in Google Ads because, as a matter of principle, Analytics attributes conversions to all traffic sources, not only to Google Ads.
  • Such attribution is difficult to interpret, especially if Google Ads uses other attribution models for the last-click conversions imported from Analytics.
  • It is vulnerable to unforeseen Google Analytics configuration and link tagging errors, such as unwanted referrals or redundant UTM parameters, which may suddenly increase the credit attributed to other sources. 

Google engineers probably understood this issue and recently added more options. 

Today, if you import conversions from GA4 to Google Ads, the conversions will be imported using the attribution model selected in the Google Ads conversion settings. 

Additionally, it is possible to choose which channels are eligible to receive conversion credit for web conversions shared with Google Ads. You can decide whether your GA4 conversion export attributes conversions: 

  • Only to Google Ads.
  • Or across all channels.  

Attributing only to Google Ads makes the conversion export very similar to native Google Ads tracking. 

The conversions are attributed solely to Google Ads clicks on the attribution path, and no credit is assigned to other channels.

As of June 2023, it is the default setting for properties creating a link between Google Ads and GA4 for the first time. 

Channels that can receive creditChannels that can receive credit

Attribution across channels is the previously existing method. 

If you linked GA4 and Google Ads before June 2023, it should apply to your GA4 property until you change it. 

If you use this option, you should remember that the number and value of conversions will likely be smaller than in the first option or when using native Google Ads conversion tracking. 

Channels that can receive credit - Paid and organic channelsChannels that can receive credit - Paid and organic channels

This is because conversions will be partly attributed to other interactions on the conversion path (e.g., social media campaigns or organic traffic). 

If you choose the last-click model for imported conversions, the value attributed to Google Ads can sometimes even be zero. 

It is because you will only import conversions whose Google Ads source has not been overwritten by subsequent clicks from other sources (similar to how it worked in Universal Analytics). 

Regardless of the property-level attribution settings, Google Analytics allows comparisons of different attribution models in the Advertising section.

Currently, the available models are the same as those available in the property settings, and it is impossible to create custom models. 

GA4 allows reporting in two attribution time methods:

  • Interaction time.
  • Key event time.

The interaction time method is typical for advertising systems, where ad conversions are attributed to clicks and, thus – costs. It allows a correct match between costs and revenues.

Otherwise, the reports might include key events attributed to a given campaign after the end of the campaign, in a period when there is no ad spend.

On the other hand, the interaction time method may cause the total number of key events to change depending on the attribution model, as different models may attribute key events or their fractions to clicks outside the reporting period.

Moreover, the key event count and revenue for a given reporting period may grow over time until the lookback window closes.

In other words, we may observe more key events for the recent period if we look at the same report in the future – which is not the case when key events are reported in the key event time.

Both approaches have advantages and disadvantages, so it is good that we can now use both.

Attribution paths report

The GA4 attribution paths report is rich with data: days to key event and the number of interactions for a given path (touchpoints to key event).

It partly compensates for the lack of time lag and path length reports, which were separate reports in Universal Analytics.

The ability to choose an attribution model for this report may be surprising at first sight.

The attribution model does not affect attribution paths. They remain the same, and their length (number of touchpoints) and number of days to key event do not change.

Attribution paths reportAttribution paths report

In GA4, the path visualization also includes the fraction of key events assigned to a given interaction or their series in the selected attribution model.

In the last click model, the last interaction always has a 100% share in the key event, but in the other models, the distribution will be different.

This feature also allows a better understanding of how the data-driven model worked for the interactions in this report. 

Additional bar graphs are placed above the funnel report, visualizing how the selected attribution model assigned a value to channels at the beginning, middle and end of the funnel.

The early touchpoints are the first 25% of the interactions along the path, while the late touchpoints include the last 25%. The middle touchpoints are the remaining 50% of the interactions. 

If you feel that the distribution between early, middle, and late touchpoints does not look as expected for the multi-touch models, please note that if there are only two interactions, there is one early, one late, and no middle interactions.

If there is only one interaction, for the multi-touch models, it will be reported as late interaction – which distorts these reports the most. 

Probably, it would be better if the only interaction was considered as 33.3% early, 33.3% middle, and 33.3% late interaction.

Thus, the attribution model will only affect the bar charts at the top of the report and the percentages shown in the funnel visualization.

The table figures (funnel interactions, key events, revenue, funnel length, and days to key event) will remain the same, regardless of the attribution model.

By default, the attribution paths and model comparison reports include all key events in the GA4 property. Therefore, it is worth remembering to select the desired key event(s) first. 

Use of scopes in the reports

Again, the source dimensions in GA4 can have one of three scopes: session, user, and event.

  • In the case of the event scope, the attribution model specified in the property attribution settings is used.
  • The session source (session scope) is assigned to the last non-direct interaction at the session start and remains unchanged for a given session, even if there is a visit from another source during the session. It’s the “first source” of the session, although assigned in the last-click model.
  • Similarly, the first user source (user scope) is assigned to the last non-direct interaction before the first visit and remains unchanged.

In Google Analytics, all dimensions and metrics operate within their own scope. For example, the Landing page dimension has the session scope, and the Page dimension has the event scope.

Although technically possible, using dimensions and metrics of different scopes can sometimes lead to confusing or difficult-to-interpret reports. There is typically little point in making such reports in GA4.

However, some reports using dimensions and metrics of different scopes will make sense. For example, for source dimensions in GA4:

  • The number of events (event scope) paired with the First user source dimension (user scope) shows how many events were generated by users whose first visit was from a given source.
  • The number of events (event scope) paired with the session source dimension (session scope) shows how many events were generated by users during sessions with a given source.

The GA4 documentation fails to indicate how to interpret the number of sessions or users matched with the event scope. Such explorations, although possible, often contain many not set values.

However, creating such reports doesn’t make sense. (See the previously mentioned GA4 help article on scopes.)

Modeled and blended data

Finally, it is worth emphasizing the fundamental change in Google Analytics 4, where reports include data collected by the tracking code enriched with modeled data.

The modeled data uses information collected in the cookieless consent mode for users who have not given consent to tracking and data for users logged in to Google. This data is fragmentary, but Google can fill in the missing data using extrapolations and mathematical modeling.

Modeled data is available only for GA4 properties using blended reporting identity.

Thanks to blended data in GA4, we can see an approximate but more complete picture of the user’s journey.

For example, Universal Analytics recorded an iPhone user who visited the website from a YouTube ad using Safari and never returned. Universal Analytics also saw an event made by another user who came from a direct visit on the Chrome browser for Windows.

Google knows these events belong to the same user because this user was logged into Gmail and YouTube. 

This is how Google Analytics 4 can model the cross-device users’ behavior. It makes the reported number of users more real (reduces it) and improves the attribution accuracy.

In the example above, the key event from the direct session can be correctly attributed to the YouTube ad.

Not all users are always logged into Google – many do not even have a Google account.

Therefore, to make the picture more complete, Google Analytics will assume that users who are not logged in behave similarly.

Consequently, GA4 sometimes will supplement the missing sources (e.g., assign certain sources to key events that were previously assigned to direct).

The behavior of users who have not given consent to tracking is estimated similarly.

Analytics knows the number of page views and key events from the non-consented users and can model how many users generated these pageviews and conservatively attribute key events to sources.

Enriching Analytics data may take up to a week. Therefore, the recent data may change in the future.

Various privacy-oriented technology solutions, such as PCM by Apple or similar solutions proposed by Google (the Privacy Sandbox), randomly delay event reporting by 24-48 hours.

Therefore, we must get used to the fact that the full view of analytical data will only be available after some time. 

In GA4, we can also enhance the reports using the 1st party data, namely the User-ID.

GA4 reports combine the User-ID data with the Client-ID (the Analytics cookie identifier) and user provided data, which makes the data more complete, especially in the cross-device aspect and LTV measurement. 

The complexity of these processes may cause greater or lesser discrepancies between the data in different reports.

We should get used to it, but hopefully, as GA4 improves its algorithms, these discrepancies will become less and less significant.

It is worth remembering that Google Analytics is not accounting software.

Its objective is not to record every event with 100% precision but to indicate trends and support decision-making – for which approximate data is sufficient.

Author’s note: This article was written using Google help articles, answers given by Analytics support and results from my experiments. 

Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.



Source link

More articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest article