Traffic sources in Google Analytics 4 in BigQuery

Share this article

After enabling the export of Google Analytics 4 data to BigQuery, one of the first questions that usually comes up when you look at the generated dataset schema is this: which traffic source should I use?

GA4 includes different structs containing source, medium, and campaign information, but each one works differently and is meant for different kinds of analysis.

In this article, we’ll look at the four traffic source structures you’ll find in BigQuery, what each one represents, and when you should use them.

Contents

Understanding the different traffic scopes in GA4

Before going into the structures that Google Analytics 4 provides in BigQuery, it is important to understand why this happens in the first place. GA4 uses different traffic attribution scopes because each one answers a different question about user behaviour.

At user level, GA4 tries to identify how a user first arrived. This is fundamental for measuring the effectiveness of acquisition campaigns. That value remains stable over time because it represents the user’s first contact with the brand.

At session level, GA4 identifies the specific source that generated the current visit. What matters here is not the user’s history, but which channel caused this session to start. That is why the attribution follows a last-click / last non-direct click logic, which reflects actual browsing behaviour more closely and is the standard for acquisition analysis.

Finally, at event level, GA4 can capture traffic information at the exact moment each action happens. This is a more granular scope and shows the specific context of each event, such as UTMs that may only be present on certain pages. It is especially useful for detailed funnel analysis, debugging, or situations where sessions and navigation include multiple redirects or different contexts.

Exploring the different traffic source columns

Once it is clear why Google Analytics 4 uses different scopes to define traffic origin, the next step is to look at how this information appears in the BigQuery export. Although the underlying logic is the same, BigQuery organises traffic into four different structures, each with a specific purpose and a different level of detail.

Session acquisition from event parameters

In BigQuery, one of the oldest and most technical ways to identify the source of a session is through event parameters. Before Google introduced dedicated traffic source structs, the only way to obtain values such as source, medium, or campaign was to extract them manually from event_params.

These values are only recorded when the event occurs in a context where there are UTMs, a valid referrer, or when the event marks the beginning of a session, so they are not always available for every event.

This approach has two key limitations. First, it works event by event, which means the information is not consistent across the whole session. Second, it requires an UNNEST() to access the parameters, which adds complexity and increases the risk of errors. In addition, GA4 does not end a session if the user returns through a different channel within the 30-minute inactivity window.

As a result, a single session can contain multiple traffic source values at event level. That reflects the real browsing context, but it complicates the analysis when what you want is a consolidated session-level view.

To get a single session source while working only with event-scoped traffic source data, the usual approach is to propagate the first valid value across the rest of the session events. This forces you to make an explicit decision about which event gets the credit, whether that is the first click of the session, the last click, the first non-null value, and so on. Each choice affects how acquisition metrics are interpreted and can create discrepancies between analysts or tools.

Because of all this, this method is mainly used today for date ranges before May 2023, or for very granular analyses where it is necessary to know the exact value present in each event. For most acquisition analyses, GA4 now offers cleaner, more consistent, and easier-to-query structures, which we’ll cover below.

User traffic with traffic_source

The traffic_source struct contains information about the user’s acquisition source. Every event includes this struct, but its values never change, because they reflect the first campaign or channel through which the user arrived. That makes it a stable field and a useful one for analysing initial user acquisition or building cohorts based on first touch.

It is important to remember that traffic_source does not represent traffic at session level. Because of that, it is not appropriate for session performance analysis, daily traffic KPIs, or reports that aim to reflect the current visit. Its main value lies in understanding how users first arrived at the property and in longer-term acquisition analysis.

Event-level traffic with collected_traffic_source

Introduced in May 2023, the collected_traffic_source struct makes event-level traffic source analysis much easier. Unlike event_params, which requires an UNNEST() to extract session attribution parameters, collected_traffic_source exposes these values in a flat and accessible format, ready for direct SQL queries.

This struct is still event by event, so it reflects the exact source of each action, but it removes much of the complexity involved in extracting the data manually from parameters. It also centralises all traffic information in one place, which makes funnel analysis, UTM debugging, or detailed session journey analysis much easier.

Today, collected_traffic_source is the recommended option for any event-level analysis, unless your date range includes data before May 2023, in which case you would need to rely on event parameters. This structure offers a consistent and efficient way to work with event-level acquisition data without losing the detail that previously was only available through event_params.

You can use the following query to count the number of sessions by source, medium, and campaign:

WITH prep AS (
SELECT
  CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')) AS session_id,
  ARRAY_AGG(
    STRUCT(
    collected_traffic_source.manual_source AS source,
    collected_traffic_source.manual_medium AS medium,
    collected_traffic_source.manual_campaign_name AS campaign)
    ORDER BY event_timestamp
  )[OFFSET(0)] AS traffic_source
FROM
  `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20251201`
WHERE
  collected_traffic_source.manual_source IS NOT NULL
GROUP BY
  session_id)
SELECT
  traffic_source.source,
  traffic_source.medium,
  traffic_source.campaign,
  COUNT(DISTINCT session_id) AS sessions
FROM
  prep
GROUP BY
  source,
  medium,
  campaign
ORDER BY
  sessions DESC

Session traffic with session_traffic_source_last_click

The session_traffic_source_last_click struct contains session-level acquisition information, which means all the events in the same session will share the same values. This gives you a consistent session-level view of traffic, without the variations that can appear in event-level data.

The assignment follows the last-click / last non-direct click model used by GA4, which makes the values in this struct very close to what you see in the GA4 interface. Because of that, it is the most appropriate option for session acquisition analysis, dashboarding, and reporting that needs to stay aligned with GA4’s standard reports.

This struct simplifies traffic analysis considerably, because it removes the need to propagate or consolidate event-level values and lets you work directly with session metrics. That makes decisions around campaigns, channels, and marketing performance much easier.

Although session_traffic_source_last_click and collected_traffic_source both contain traffic information, the main difference is scope. While collected_traffic_source is event by event and shows the exact source attached to each action, session_traffic_source_last_click consolidates that information at session level and assigns a single value to all the events in the session according to GA4’s last non-direct click logic.

So, if you are looking for analysis that is consistent with session metrics or dashboards that match the GA4 interface, this struct is the better choice. On the other hand, for detailed analysis of individual events or UTM debugging, collected_traffic_source remains the best option.

Below is a query that uses session_traffic_source_last_click to count the number of sessions:

WITH prep AS (
SELECT
  CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')) AS session_id,
  ARRAY_AGG(
    STRUCT(
    session_traffic_source_last_click.manual_campaign.source AS source,
    session_traffic_source_last_click.manual_campaign.medium AS medium,
    session_traffic_source_last_click.manual_campaign.campaign_name AS campaign)
    ORDER BY event_timestamp
  )[OFFSET(0)] AS traffic_source
FROM
  `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_20251201`
WHERE
  session_traffic_source_last_click.manual_campaign.source IS NOT NULL
GROUP BY
  session_id)
SELECT
  traffic_source.source,
  traffic_source.medium,
  traffic_source.campaign,
  COUNT(DISTINCT session_id) AS sessions
FROM
  prep
GROUP BY
  source,
  medium,
  campaign
ORDER BY
  sessions DESC

Google Analytics 4 Channel Groups in BigQuery

The session_traffic_source_last_click struct includes a record called cross_channel_campaign, which provides campaign information using GA4’s data-driven attribution model for conversions. Within this record, you will find the fields default_channel_group and primary_channel_group, which reflect how GA4 classifies channels by default and align closely with what you would see in the interface.

These channel groups work differently from the information in collected_traffic_source. While collected_traffic_source captures the source of each event without applying an attribution model, GA4’s default channel groups reorganise traffic according to how credit is assigned to each channel, especially for conversions. This can create important differences when comparing metrics with those built manually from event-level traffic sources.

The choice between using the default channel groups and building your own grouping depends on the objective of the analysis. The default ones provide fast, consistent results that align with the GA4 interface, while a custom grouping allows you to adapt the categorisation to specific needs or more advanced marketing strategies.

If you decide to build your own channel grouping using collected_traffic_source, Google provides the classification rules used to generate the default channel group, which can serve as a useful starting point for your own version.

In short, Google Analytics 4 offers multiple ways to capture and analyse traffic, and BigQuery reflects that through different structures with different scopes: user, session, and event.

Each struct, from event_params and collected_traffic_source to traffic_source and session_traffic_source_last_click, has its own advantages and limitations depending on the type of analysis you want to run, whether that is granular event-level analysis, consolidated session analysis, or long-term first-user acquisition analysis.

Share this article