Session Reconstruction
GA4 does not store sessions as first-class records in BigQuery — it stores events. Sessions must be reconstructed by grouping events using the user_pseudo_id + ga_session_id composite key. This guide shows how to build a complete session-level table suitable for most analytics use cases.
The session identifier
Section titled “The session identifier”Every event in GA4 (except a few system events) carries two parameters:
ga_session_id— an integer timestamp of session start, in secondsga_session_number— which session this is for the user
The combination of user_pseudo_id + ga_session_id is the unique session identifier. Use this composite key wherever you need to group events into sessions.
-- The session ID for any event(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_idBuilding a sessions CTE
Section titled “Building a sessions CTE”The standard pattern is a CTE that aggregates all events per session and extracts the session-level attributes:
WITH sessions AS ( SELECT user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS ga_session_id, CONCAT(user_pseudo_id, '-', CAST((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING) ) AS session_id,
-- Session metadata MIN(event_date) AS session_date, MIN(TIMESTAMP_MICROS(event_timestamp)) AS session_start, MAX(TIMESTAMP_MICROS(event_timestamp)) AS session_end, TIMESTAMP_DIFF( MAX(TIMESTAMP_MICROS(event_timestamp)), MIN(TIMESTAMP_MICROS(event_timestamp)), SECOND ) AS session_duration_seconds,
-- Engagement MAX( CASE WHEN (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') = '1' THEN 1 ELSE 0 END ) AS is_engaged, SUM((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec')) / 1000 AS engagement_seconds,
-- Session number MAX((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_number')) AS session_number,
-- Event counts COUNT(*) AS total_events, COUNTIF(event_name = 'page_view') AS pageviews,
-- Conversions COUNTIF(event_name = 'purchase') AS purchases, SUM(CASE WHEN event_name = 'purchase' THEN ecommerce.purchase_revenue ELSE 0 END) AS revenue,
-- Session-level traffic source (using collected_traffic_source) MAX(collected_traffic_source.manual_source) AS source, MAX(collected_traffic_source.manual_medium) AS medium, MAX(collected_traffic_source.manual_campaign_name) AS campaign, MAX(collected_traffic_source.gclid) AS gclid,
-- First-touch traffic source (user-level) MAX(traffic_source.source) AS first_touch_source, MAX(traffic_source.medium) AS first_touch_medium,
-- Device MAX(device.category) AS device_category, MAX(device.operating_system) AS operating_system, MAX(device.browser) AS browser,
-- Geography MAX(geo.country) AS country, MAX(geo.city) AS city
FROM `project.analytics_PROPERTY_ID.events_*` WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131' -- Exclude events without a session ID (some system events) AND (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') IS NOT NULL
GROUP BY user_pseudo_id, ga_session_id, session_id)
SELECT *FROM sessionsORDER BY session_startAdding landing page to sessions
Section titled “Adding landing page to sessions”The landing page is the first page_view event in the session:
WITH session_pages AS ( SELECT user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id, (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'page_location') AS page_location, event_timestamp, ROW_NUMBER() OVER ( PARTITION BY user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') ORDER BY event_timestamp ) AS page_rank FROM `project.analytics_PROPERTY_ID.events_*` WHERE event_name = 'page_view' AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'),
landing_pages AS ( SELECT user_pseudo_id, session_id, page_location AS landing_page FROM session_pages WHERE page_rank = 1)
-- Join to sessions CTESELECT s.*, l.landing_pageFROM sessions sLEFT JOIN landing_pages l USING (user_pseudo_id, session_id)Session-level source attribution
Section titled “Session-level source attribution”Getting the session-level source correctly requires care. The top-level traffic_source struct reflects the user’s original acquisition source (first touch), not the current session. Use collected_traffic_source for session-level attribution:
-- Session-level campaign parameters using collected_traffic_sourceWITH session_sources AS ( SELECT CONCAT(user_pseudo_id, '-', CAST((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING) ) AS session_id, user_pseudo_id, collected_traffic_source.manual_source AS source, collected_traffic_source.manual_medium AS medium, collected_traffic_source.manual_campaign_name AS campaign, collected_traffic_source.manual_content AS content, collected_traffic_source.manual_term AS term, collected_traffic_source.gclid, collected_traffic_source.dclid, collected_traffic_source.srsltid FROM `project.analytics_PROPERTY_ID.events_*` WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131' AND (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'entrances') = 1)
SELECT *FROM session_sourcesPersisting session data with a permanent table
Section titled “Persisting session data with a permanent table”For repeated analysis, create a permanent sessions table rather than recomputing the CTE each time:
-- Create a sessions table for a specific date rangeCREATE OR REPLACE TABLE `project.analytics_derived.sessions_2024` AS
WITH all_events AS ( SELECT user_pseudo_id, event_name, event_date, event_timestamp, event_params, user_properties, traffic_source, device, geo, ecommerce FROM `project.analytics_PROPERTY_ID.events_*` WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20241231')
SELECT user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id, MIN(event_date) AS session_date, -- ... (remaining aggregations as above)FROM all_eventsWHERE (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') IS NOT NULLGROUP BY user_pseudo_id, session_idThis approach dramatically reduces query costs for repeated analysis — you query the derived table (much smaller, no unnesting needed) rather than the raw events tables.
Handling cross-day sessions
Section titled “Handling cross-day sessions”Sessions that start before midnight and end after midnight span two tables (events_YYYYMMDD for each day). The ga_session_id is the same across both days, so grouping by user_pseudo_id + session_id naturally merges them.
However, event_date may differ across events in the same session. Use the minimum event_date as the session date:
MIN(event_date) AS session_dateSession with user properties
Section titled “Session with user properties”Include user properties from the session’s events:
-- Get user property at session timeMAX((SELECT value.string_value FROM UNNEST(user_properties) WHERE key = 'subscription_tier')) AS subscription_tierThis gets the user property value as it was set at any point during the session. If the user upgraded mid-session, you may get either value depending on event ordering. For consistency, use the first non-null value:
-- First non-null value seen in sessionMAX(CASE WHEN (SELECT value.string_value FROM UNNEST(user_properties) WHERE key = 'subscription_tier') IS NOT NULL THEN (SELECT value.string_value FROM UNNEST(user_properties) WHERE key = 'subscription_tier')END) AS subscription_tierComplete session table query
Section titled “Complete session table query”WITH events AS ( SELECT user_pseudo_id, event_name, event_date, TIMESTAMP_MICROS(event_timestamp) AS event_time, event_params, user_properties, traffic_source, collected_traffic_source, device, geo, ecommerce FROM `project.analytics_PROPERTY_ID.events_*` WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'),
sessions AS ( SELECT user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS ga_session_id, CONCAT(user_pseudo_id, '-', CAST((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS STRING) ) AS session_id, MIN(event_date) AS session_date, MIN(event_time) AS session_start, MAX(event_time) AS session_end, TIMESTAMP_DIFF(MAX(event_time), MIN(event_time), SECOND) AS duration_seconds, SUM((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec')) / 1000 AS engagement_seconds, -- Session-level source (collected_traffic_source) MAX(collected_traffic_source.manual_source) AS source, MAX(collected_traffic_source.manual_medium) AS medium, MAX(collected_traffic_source.manual_campaign_name) AS campaign, -- First-touch source (user-level) MAX(traffic_source.source) AS first_touch_source, MAX(traffic_source.medium) AS first_touch_medium, MAX((SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_number')) AS session_number, MAX(CASE WHEN (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') = '1' THEN 1 ELSE 0 END) AS is_engaged, COUNTIF(event_name = 'page_view') AS pageviews, COUNTIF(event_name = 'purchase') AS purchases, SUM(CASE WHEN event_name = 'purchase' THEN ecommerce.purchase_revenue ELSE 0 END) AS revenue, MAX(device.category) AS device_category, MAX(geo.country) AS country, MAX((SELECT value.string_value FROM UNNEST(user_properties) WHERE key = 'subscription_tier')) AS subscription_tier FROM events WHERE (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') IS NOT NULL GROUP BY user_pseudo_id, ga_session_id, session_id)
SELECT session_date, COUNT(*) AS total_sessions, SUM(is_engaged) AS engaged_sessions, ROUND(AVG(duration_seconds), 0) AS avg_duration_seconds, ROUND(AVG(engagement_seconds), 0) AS avg_engagement_seconds, SUM(pageviews) AS total_pageviews, SUM(purchases) AS total_purchases, SUM(revenue) AS total_revenueFROM sessionsGROUP BY session_dateORDER BY session_dateCommon mistakes
Section titled “Common mistakes”Using session_start count instead of unique session IDs
Section titled “Using session_start count instead of unique session IDs”SELECT COUNT(*) FROM ... WHERE event_name = 'session_start' is less reliable than COUNT(DISTINCT CONCAT(user_pseudo_id, '-', CAST(session_id AS STRING))). Some sessions legitimately lack a session_start event. The distinct count is more accurate.
Confusing traffic_source with collected_traffic_source
Section titled “Confusing traffic_source with collected_traffic_source”The top-level traffic_source struct is the user’s first-touch acquisition source — it is the same for every event from that user. For session-level source/medium, use collected_traffic_source which contains the UTM parameters and click IDs for the current session.
Missing sessions that span two days
Section titled “Missing sessions that span two days”If your CTE filters raw events by event_date = '2024-01-15', you miss the tail end of sessions that started on the 14th and continued past midnight. Filter by _TABLE_SUFFIX over a range and group by session ID to capture complete sessions.