Multi-Touch Attribution Models in BigQuery
In November 2023, Google removed four attribution models from the GA4 UI:
- Position-based (first/last click weighted)
- Time-decay
- Linear
- Data-driven
These models are now exclusively available through BigQuery and the Google Analytics Data API via custom implementations. This guide covers SQL implementations for the three deterministic models and discusses why data-driven attribution is harder to replicate.
Why attribution modeling matters
Section titled “Why attribution modeling matters”Attribution answers the question: Which touchpoints deserve credit for a conversion?
Different models tell different stories:
- Last-click: “The final touchpoint drives conversions” (simplest, default in GA4)
- First-touch: “Initial awareness matters most”
- Linear: “All touchpoints contribute equally”
- Time-decay: “Recent interactions matter more”
- Position-based: “First and last touches matter; middle touches split the remainder”
By analyzing each model, you can identify undervalued channels and optimize budget allocation.
Position-based attribution (40/20/40)
Section titled “Position-based attribution (40/20/40)”The position-based model allocates:
- 40% of credit to the first interaction
- 20% to middle interactions (split equally)
- 40% to the last interaction before conversion
SQL implementation
Section titled “SQL implementation”WITH user_sessions AS ( SELECT user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS session_id, collected_traffic_source.manual_source AS source, collected_traffic_source.manual_medium AS medium, event_timestamp, event_name, ROW_NUMBER() OVER ( PARTITION BY user_pseudo_id ORDER BY event_timestamp ) AS event_sequence FROM `project.analytics_PROPERTY_ID.events_*` WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131' AND (event_name = 'page_view' OR event_name = 'purchase') -- Adjust based on your conversion event),conversion_paths AS ( SELECT user_pseudo_id, ARRAY_AGG( STRUCT( source, medium, event_sequence, event_timestamp ) ORDER BY event_timestamp ) AS path FROM user_sessions WHERE event_name = 'purchase' -- Only users who converted GROUP BY user_pseudo_id),touchpoint_credits AS ( SELECT user_pseudo_id, source, medium, event_sequence, CASE WHEN event_sequence = ARRAY_LENGTH(path) THEN 0.40 -- Last touch WHEN event_sequence = 1 THEN 0.40 -- First touch ELSE 0.20 / (ARRAY_LENGTH(path) - 2) -- Middle touches (equal split of 20%) END AS credit FROM conversion_paths, UNNEST(path) AS touchpoint WITH OFFSET AS event_sequence WHERE event_sequence < ARRAY_LENGTH(path) -- Exclude the conversion event itself)SELECT source, medium, SUM(credit) AS total_credit, COUNT(DISTINCT user_pseudo_id) AS converting_usersFROM touchpoint_creditsGROUP BY source, mediumORDER BY total_credit DESCThis approach:
- Collects all page_view events per user (ordered by timestamp)
- Identifies only users who converted
- Allocates 40% to the first touch, 40% to the last, and splits 20% equally among middle touches
- Aggregates credit by source/medium
Time-decay attribution
Section titled “Time-decay attribution”Time-decay allocates more credit to recent interactions using an exponential decay function. The most common formula:
credit = 2^(-days_since_conversion / half_life)With a half-life of 7 days:
- Interaction 7 days before conversion: 50% of the weight of the last interaction
- Interaction 14 days before conversion: 25% of the weight of the last interaction
- Interaction 1 day before conversion: ~93% of the weight of the last interaction
SQL implementation
Section titled “SQL implementation”WITH user_conversions AS ( SELECT user_pseudo_id, MIN(event_timestamp) AS first_conversion_timestamp FROM `project.analytics_PROPERTY_ID.events_*` WHERE event_name = 'purchase' AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131' GROUP BY user_pseudo_id),conversion_journeys AS ( SELECT e.user_pseudo_id, e.event_timestamp, e.collected_traffic_source.manual_source AS source, e.collected_traffic_source.manual_medium AS medium, c.first_conversion_timestamp, TIMESTAMP_DIFF(c.first_conversion_timestamp, e.event_timestamp, DAY) AS days_before_conversion FROM `project.analytics_PROPERTY_ID.events_*` e INNER JOIN user_conversions c USING (user_pseudo_id) WHERE e.event_name = 'page_view' AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131' AND e.event_timestamp <= c.first_conversion_timestamp AND e.event_timestamp > TIMESTAMP_SUB(c.first_conversion_timestamp, INTERVAL 90 DAY) -- Look back window),time_decay_credits AS ( SELECT user_pseudo_id, source, medium, days_before_conversion, -- Exponential decay with 7-day half-life POW(2.0, -days_before_conversion / 7.0) AS decay_weight FROM conversion_journeys),normalized_credits AS ( SELECT user_pseudo_id, source, medium, decay_weight / SUM(decay_weight) OVER (PARTITION BY user_pseudo_id) AS normalized_credit FROM time_decay_credits)SELECT source, medium, SUM(normalized_credit) AS total_credit, COUNT(DISTINCT user_pseudo_id) AS converting_users, ROUND(SUM(normalized_credit) / COUNT(DISTINCT user_pseudo_id), 4) AS avg_credit_per_userFROM normalized_creditsGROUP BY source, mediumORDER BY total_credit DESCKey concepts:
- Find the first conversion timestamp for each user
- Collect all page_view events leading up to that conversion (within a 90-day lookback window)
- Calculate
decay_weight = 2^(-days_before_conversion / 7)for each touchpoint - Normalize weights so they sum to 1.0 per user (fair allocation per conversion)
- Aggregate by source/medium
You can adjust the half-life (currently 7 days) to reflect your business:
- 3 days: Very recent activity is heavily weighted
- 14 days: Longer consideration cycle
- 30 days: Extended purchase cycle
Linear attribution (equal credit)
Section titled “Linear attribution (equal credit)”Linear attribution gives equal credit to every touchpoint in the conversion path.
SQL implementation
Section titled “SQL implementation”WITH user_conversions AS ( SELECT user_pseudo_id, event_timestamp AS conversion_timestamp, ROW_NUMBER() OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS conversion_number FROM `project.analytics_PROPERTY_ID.events_*` WHERE event_name = 'purchase' AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131'),conversion_journeys AS ( SELECT e.user_pseudo_id, e.event_timestamp, e.collected_traffic_source.manual_source AS source, e.collected_traffic_source.manual_medium AS medium, c.conversion_timestamp, COUNT(*) OVER ( PARTITION BY e.user_pseudo_id, c.conversion_number ORDER BY e.event_timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING ) + 1 AS touchpoint_number, COUNT(*) OVER ( PARTITION BY e.user_pseudo_id, c.conversion_number ) AS total_touchpoints FROM `project.analytics_PROPERTY_ID.events_*` e INNER JOIN user_conversions c USING (user_pseudo_id) WHERE e.event_name = 'page_view' AND _TABLE_SUFFIX BETWEEN '20240101' AND '20240131' AND e.event_timestamp < c.conversion_timestamp AND e.event_timestamp > TIMESTAMP_SUB(c.conversion_timestamp, INTERVAL 90 DAY))SELECT source, medium, SUM(1.0 / total_touchpoints) AS total_credit, COUNT(DISTINCT user_pseudo_id) AS converting_usersFROM conversion_journeysGROUP BY source, mediumORDER BY total_credit DESCThis is the simplest model: divide 1 conversion equally among all touchpoints.
For example, if a user has a 3-touchpoint path (organic search → direct → paid search → purchase), each touchpoint gets 1/3 credit (0.33).
Data-driven attribution discussion
Section titled “Data-driven attribution discussion”Data-driven attribution is a machine learning model that determines credit allocation based on historical conversion patterns. Google uses this model internally for Google Ads accounts.
Why it’s hard to replicate in SQL
Section titled “Why it’s hard to replicate in SQL”- Training data requirement: You need thousands of conversion paths to train a model
- Feature engineering: Requires extracting features from paths (day of week, time between touches, channel combinations, etc.)
- Counterfactual analysis: Ideally, you’d compare “path with touchpoint X removed” to “path with touchpoint X present” — expensive to compute
- Regularization and tuning: Machine learning models need hyperparameter tuning to avoid overfitting
Alternatives to replicate data-driven attribution
Section titled “Alternatives to replicate data-driven attribution”For practical purposes, consider:
- Logistic regression in BigQuery ML: Train a model to predict conversion probability given channel combinations
- Markov chain model: Simulate the probability of conversion based on state transitions (complex but deterministic)
- Google Analytics 360 Data-Driven Attribution: If you have GA 360, use Google’s built-in model (available via the API)
- Third-party attribution platforms: Venture Ads, Lifesight, and others provide DDA across channels
Here’s a simple logistic regression example in BigQuery ML:
CREATE OR REPLACE MODEL `project.analytics_derived.dda_model` ASSELECT source, medium, campaign_name, CASE WHEN event_name = 'purchase' THEN 1 ELSE 0 END AS convertedFROM `project.analytics_PROPERTY_ID.events_*`WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131' AND event_name IN ('page_view', 'purchase');
-- Then predict probabilitiesSELECT source, medium, campaign_name, predicted_converted_probs[OFFSET(1)].prob AS conversion_probabilityFROM ML.PREDICT( MODEL `project.analytics_derived.dda_model`, ( SELECT DISTINCT source, medium, campaign_name FROM `project.analytics_PROPERTY_ID.events_*` WHERE _TABLE_SUFFIX BETWEEN '20240101' AND '20240131' ))This predicts conversion probability by channel combination, which you can use to weight attribution.
Building a comparison dashboard in Looker Studio
Section titled “Building a comparison dashboard in Looker Studio”Create a dashboard comparing all attribution models side-by-side:
- Create a derived table in BigQuery that materializes all three models (position-based, time-decay, linear)
- Schedule a daily refresh using a scheduled query
- Connect Looker Studio to the derived table
- Add cards for each metric:
- Table: Source/Medium with columns for each model’s credit
- Chart: Total credit by model (stacked bar)
- Chart: Model comparison scatter plot (position-based vs. linear, etc.)
Example derived table:
CREATE OR REPLACE TABLE `project.analytics_derived.attribution_models` AS-- UNION all three models with a model_type columnSELECT 'position_based' AS model_type, source, medium, total_credit FROM (SELECT source, medium, SUM(credit) AS total_credit FROM ... )UNION ALLSELECT 'time_decay' AS model_type, source, medium, total_credit FROM (SELECT source, medium, SUM(normalized_credit) AS total_credit FROM ...)UNION ALLSELECT 'linear' AS model_type, source, medium, total_credit FROM (SELECT source, medium, SUM(1.0 / total_touchpoints) AS total_credit FROM ...)Handling direct traffic and null channels
Section titled “Handling direct traffic and null channels”Direct traffic and organic search often appear with NULL or incomplete channel data:
-- Coalesce missing source/medium to sensible defaultsSELECT COALESCE(source, '(direct)') AS source, COALESCE(medium, '(none)') AS medium, total_creditFROM attribution_dataWHERE source IS NULL OR medium IS NULLUse these in your Looker Studio filters to keep direct traffic visible but aggregated separately from paid channels.
Credit allocation formulas with clear SQL examples
Section titled “Credit allocation formulas with clear SQL examples”Example 1: Last-click (baseline)
Section titled “Example 1: Last-click (baseline)”-- Every conversion assigns 100% credit to the last touchpointSELECT source, medium, COUNT(DISTINCT user_pseudo_id) AS creditFROM conversion_last_touchGROUP BY source, mediumExample 2: First-click
Section titled “Example 2: First-click”-- Assign 100% credit to the first touchpoint onlySELECT source, medium, COUNT(DISTINCT user_pseudo_id) AS creditFROM conversion_first_touchGROUP BY source, mediumExample 3: 50/50 (first and last split)
Section titled “Example 3: 50/50 (first and last split)”-- Split credit 50/50 between first and last; ignore middle touchesSELECT source, medium, SUM(0.5) AS creditFROM conversion_endpointsGROUP BY source, mediumRelated resources
Section titled “Related resources”- aliasoblomov/Bigquery-GA4-Queries GitHub — Community SQL examples
- GA4BigQuery.com — GA4 + BigQuery documentation and templates
- Google Analytics Reporting API v4 Attribution — Concepts and definitions
- Looker Studio BigQuery Connector — Dashboard setup guide
Common questions
Section titled “Common questions”Which attribution model should I use?
Section titled “Which attribution model should I use?”Start with last-click (GA4 default) because it’s simple and works for short sales cycles. Experiment with linear for consideration cycles (SaaS, B2B). Use time-decay if recent behavior is most predictive.
How far back should I look (lookback window)?
Section titled “How far back should I look (lookback window)?”Use 90 days as a baseline for most industries. Adjust to match your typical sales cycle: shorter for ecommerce (14-30 days), longer for enterprise (180+ days).
Can I use this for Google Ads optimization?
Section titled “Can I use this for Google Ads optimization?”These SQL models show historical attribution. To optimize bids, use Google Ads’ native data-driven attribution or import GA4 key events as conversion actions in Google Ads.
How do I handle multi-session paths?
Section titled “How do I handle multi-session paths?”The examples above work across all sessions for a user (within the lookback window). If you want session-level paths only, add a filter on session_id instead of user_pseudo_id.