Digital marketing attribution models: A tech survey

Thu Apr 17 2025

Author's Note: You will notice a lot of formulas in this article.

Having formulas doesn't necessarily make the model scientific or applicable. This survey summarizes these models as objectively as I could, but I'm going to publish other articles with my opinions.

1. Brief history of marketing attribution in digital channels

Early marketing attribution efforts trace back to Media Mix Modeling (MMM) in the mid-20th century, well before the digital era. MMM emerged in the 1950s (though it rose to popularity in the 1980s) as a top-down approach that uses aggregate data and statistical regression to estimate the contribution of each marketing channel to sales [1]. Although MMMs offered high-level budget allocation insights, they lacked the granularity and user-level, real-time detail that digital marketers would later demand.

With the rise of the internet, single-touch attribution models became standard. Last-click attribution (often called last-touch) was especially popular, granting all credit to the final touchpoint before conversion [1]. By the early 2000s, tools like Google Analytics defaulted to last-click models for their simplicity. This approach allowed marketers to measure direct-response channels effectively, but it ignored the influence of any earlier interactions.

As customer journeys became more intricate—spanning multiple channels, devices, and sessions—multi-touch attribution models emerged, distributing credit across more than one interaction. Early versions were still rule-based (for example, splitting credit evenly or favoring the first and last touches). While these were better than single-touch models, they came with inherent biases and fixed rules that do not adjust for interactions or relative importance among channels [2].

Around 2010–2012, data-driven attribution models began appearing. Researchers and analytics teams introduced algorithmic methods (including probabilistic models, Markov chains, logistic regression, and game-theoretic Shapley values) to estimate each channel’s contribution using actual conversion path data [2, 3]. These data-driven models promised more accurate reflection of a channel’s influence, since they relied on patterns learned from the data rather than preset heuristics.

In more recent years, the industry has explored machine learning and AI-driven attribution, including deep learning approaches like recurrent neural networks or attention-based models. These aim to capture complex, nonlinear interactions in long customer journeys [1, 3]. At the same time, there has been a renewed emphasis on incrementality and causal attribution, where controlled experiments or advanced causal inference techniques attempt to isolate the true lift from each channel. Overall, marketing attribution continues to evolve as marketers seek greater accuracy and actionable insight into how digital touchpoints drive conversions.

2. Major attribution models: rule-based vs. data-driven

Marketing attribution models can be broadly split into two categories: rule-based and data-driven. Rule-based models allocate conversion credit by applying a fixed heuristic, while data-driven models rely on algorithms or experiments to learn from the data. We will survey the main models within each category.


Rule-based attribution models

Rule-based models are easy to implement and explain but cannot adapt to unique patterns in the data. They distribute credit according to predefined positions or weights rather than learning from user behavior. Common examples include First-Touch, Last-Touch, Linear, Time Decay, U-Shaped, and W-Shaped.

First-touch attribution

Description: First-touch (or first-click) attribution gives 100% of the credit for a conversion to the first marketing touchpoint in the user’s journey [4]. All subsequent interactions receive zero credit.

Mathematically, if the sequence of touchpoints is indexed \( \{1, 2, \ldots, n\} \) and \( i = 1 \) is the first, the credit allocation can be written as:

\[ \text{Credit}(i) = \begin{cases} 1, & \text{if } i = 1 \\ 0, & \text{otherwise} \end{cases} \]

The first channel essentially gets the entire conversion value (often set to 1 or to the revenue amount).

Use cases and limitations: First-touch is useful for identifying the top-of-funnel channel that introduces prospects to the brand. It highlights which campaigns do the best job of generating awareness or initial engagement. However, it ignores mid-journey or closing interactions entirely, which can be especially problematic for businesses with longer or more complex funnels [4].

Mathematical Intuition: The model assumes the initiator of the journey causes the conversion, effectively treating the first contact as 100% responsible. This can be an oversimplification.

Industry Usage: First-touch is less popular than last-touch but still used in certain contexts—often to understand which channels drive new leads. Most teams do not rely on it exclusively for budget allocation.


Last-touch attribution

Description: Last-touch (or last-click) attribution assigns 100% of the credit to the final touchpoint in the user’s journey [4]. Earlier interactions receive no credit. If a user’s path is (Organic -> Email -> Paid Search -> Conversion), Paid Search would receive full credit if it was the last click before conversion.

Formally, if \( n \) is the index of the last touch:

\[ \text{Credit}(i) = \begin{cases} 1, & \text{if } i = n \\ 0, & \text{otherwise} \end{cases} \]

Use Cases & Limitations: Last-touch is straightforward and was the default in many analytics platforms because it is easy to implement and interpret [2]. It is useful for emphasizing bottom-of-funnel tactics that “close” a sale. However, it completely ignores any marketing interactions that occur earlier in the journey, often undervaluing channels that drive initial awareness and mid-funnel nurturing [5].

Mathematical Intuition: It assumes the final interaction is decisive—like saying only the last runner in a relay race deserves the gold medal. It is a heuristic that conflates correlation with causation.

Industry Usage: Historically the most common attribution model in digital analytics due to its simplicity. Although many platforms (e.g., Google Analytics) have shifted to data-driven or multi-touch models, last-click still appears in many performance reports.


Linear attribution

Description: Linear attribution equally splits conversion credit among all touchpoints in the sequence [4]. If there are (N) total interactions leading up to a conversion, each one is allocated 1/N of the credit. For instance, if a user’s path is (Facebook -> Email -> Organic -> Conversion), each channel gets 33.3% (assuming a single conversion worth 1).

Use Cases & Limitations: Linear models are used when marketers want a neutral, all-touchpoints-matter perspective. They are easy to explain and ensure no single interaction is ignored [4]. But in reality, all touches are seldom equally influential. A trivial early interaction might get the same credit as a key final push.

Mathematical Intuition: Credit is allocated uniformly:

\[ \text{Credit}(i) = \frac{1}{N} \]

where (N) is the number of touches in that journey. It is a simplistic notion of “team effort,” ignoring sequence or timing.

Industry Usage: Widely available in analytics tools and often used as a baseline multi-touch model. Marketers sometimes pair it with other views (e.g., last-touch, first-touch) to compare how much difference position-based weighting makes.


Time decay attribution

Description: Time Decay attribution weights recent touchpoints more heavily than older ones, assuming that proximity to conversion implies stronger influence. A common implementation uses an exponential decay function, often with a 7-day half-life. In that example, a touchpoint 7 days before conversion has half the weight of a touchpoint on the day of conversion, 14 days prior has a quarter, etc. A generic formula might be:

\[ \text{Credit}_i = \frac{\frac{1}{2^{(d_i / \lambda)}}} {\sum_{j=1}^{N} \frac{1}{2^{(d_j / \lambda)}}} \]

where \( d_i \) is how many days before the conversion touch \( i \) occurred, and \( \lambda \) is the half-life (for instance, 7 days).

Use Cases & Limitations: Time decay is appealing when recency is believed to be crucial, so it is common in cases with longer decision cycles [6]. The drawbacks are that choosing a half-life is somewhat arbitrary and that an impactful early interaction might be undervalued simply because it occurred too far from the conversion date.

Mathematical Intuition: It models an assumption that influence diminishes over time, akin to a “memory fade.” While more realistic than equal weighting, it remains a heuristic.

Industry Usage: Offered in platforms like Google Analytics’ Model Comparison Tool. Often used by marketers who want to emphasize near-conversion touches but still grant some credit to earlier ones.


U-shaped (position-based) attribution

Description: A U-Shaped attribution model allocates a large portion of credit to the first and last touchpoints, with the remainder divided among any middle touches [7]. A common variant gives 40% credit to the first interaction, 40% to the last, and 20% shared among everything in between. If there are no middle interactions, the first and last each get 50%.

Use Cases & Limitations: This model is popular when a marketer believes the initial introduction and the final conversion trigger are most important, while mid-funnel touches are supportive. It still relies on an arbitrary weighting (e.g., 40/40/20) that might not reflect actual importance. If there are many mid-funnel touches, each can be left with only a small slice.

Mathematical Intuition: Credit distribution is “high at the start, high at the end, lower in the middle,” approximating a funnel emphasis on lead creation and deal closing.

Industry Usage: Sometimes called “Position-Based” attribution in Google Analytics or Adobe. Frequently seen in B2B funnels, or any scenario where first and last engagements are assumed especially significant.


W-shaped attribution

Description: A W-Shaped model extends U-Shaped by highlighting three key points: first touch, a key mid-funnel milestone, and the final touch. For example, 30% might go to the first interaction, 30% to the mid-funnel event (e.g., an MQL or opportunity creation), 30% to the last touch, and the remaining 10% shared among any other touches [7].

Use Cases & Limitations: Primarily used in B2B contexts where there is a well-defined mid-funnel milestone. The main limitation is again arbitrariness: it encodes an assumption that exactly these three points matter most, which may or may not be accurate depending on the specific sales process.

Mathematical Intuition: Hard-codes three pivotal moments—initial lead, key middle milestone, final sale—and de-emphasizes everything else.

Industry Usage: Common in lead-focused B2B organizations. Outside of longer, stage-based funnels, W-Shaped is less relevant.


Data-driven attribution models

Data-driven models learn from historical data or from experiments, rather than imposing a fixed rule. They aim to uncover each channel’s contribution to conversion more objectively. Below are some key approaches:

  • Markov Chain Attribution

  • Shapley Value Attribution

  • Algorithmic/Statistical Models (e.g., Logistic Regression, ML)

  • Incrementality Testing and Lift Models

  • Causal Inference-Based Multi-Channel Attribution

  • Customer Journey-based Deep Learning Models


Markov chain attribution

Description: Markov chain attribution interprets user journeys as state transitions between channels, plus absorbing states for conversion or no-conversion [8]. It calculates transition probabilities between these states from the data and uses a “removal effect” to measure each channel’s importance: the drop in the overall probability of eventually converting when a channel is removed indicates that channel’s share of credit [8, 9].

Use Cases & Benefits:

  • Captures sequential nature of journeys.

  • Explains results via the “If we remove channel X, how many conversions are lost?” logic, which is intuitive.

  • More nuanced than any single-position rule; channels that primarily appear in converting paths get more credit.

Limitations:

  • Still correlation-based; it does not prove causation.

  • Often assumes a first-order Markov process (memoryless beyond the current state).

  • Can end up distributing credit somewhat “linearly” across many channels, especially if data is limited or channels frequently co-occur [9].

  • Single-touch journeys need special handling (the model might incorrectly reallocate some of that channel’s credit to others if not properly accounted for).

Mathematical Intuition: The core idea is computing each channel’s incremental effect by comparing the chain’s absorption probability into conversion with and without that channel.

Industry Usage: Widely adopted by data-savvy teams and some attribution vendors. It is relatively straightforward to implement via open-source libraries, and many find it more realistic than rigid heuristics.


Shapley value attribution

Description: Based on cooperative game theory, the Shapley value model treats each channel as a “player” contributing to the “game” of achieving a conversion. The Shapley value of a channel is the average marginal contribution that channel provides across all possible subsets of channels [2, 10].

Symbolically, for channel \( i \):

\[ \phi_i = \frac{1}{|\mathcal{N}|!} \sum_{R \in \text{permutations of all channels}} \left( v(P_i^R \cup \{i\}) - v(P_i^R) \right) \]

where \( \mathcal{N} \) is the set of channels, \( R \) is a permutation, and \( P_i^R \) is the subset of channels preceding \( i \) in that permutation. \( v(S) \) is the total conversions (or conversion rate) from the set of channels \( S \).

Use cases and benefits:

  • Regarded as fair in game-theoretic terms, attributing credit based on actual contribution.

  • Naturally accounts for interaction effects; if two channels are valuable only in combination, Shapley will split the synergy between them.

Limitations:

  • Computational complexity grows exponentially with the number of channels. Practical implementations rely on sampling or approximations [10].

  • Order is ignored in classical Shapley (it is set-based, not sequence-based), so timing nuances can be missed unless extended methods are used.

  • Still susceptible to correlation rather than strict causation if the data has hidden biases.

Mathematical intuition: A channel’s Shapley value is its average incremental effect across all possible coalition scenarios.

Industry usage: Adopted in advanced attribution solutions—Google’s Ads Data Hub includes a Shapley-based analysis option, and some “data-driven attribution” products rely on Shapley-like logic [10]. It is seen as a robust approach for multi-touch credit allocation, though the computational demands can be high.


Algorithmic and statistical attribution models (logistic regression & machine learning)

Description: Here, one typically trains a classification model (logistic regression or more advanced ML) to predict whether a user will convert based on which channels or touches they experienced [11, 12]. Coefficients or feature-importance measures can then inform how much each channel contributes to conversions.

  • Logistic Regression:
    \[ \log \left( \frac{P(\text{convert})}{1 - P(\text{convert})} \right) = \beta_0 + \beta_1 X_1 + \cdots + \beta_k X_k, \] where \( X_i \) might indicate exposure to channel \( i \).
  • Tree-Based / Ensemble Methods:
    Random forests or gradient boosting can capture nonlinear interactions. Attribution often uses methods like feature ablation (“remove channel X and see how predicted conversion changes”) or SHAP (a model-based Shapley approach for feature importance).

Use cases and benefits:

  • Flexibility to include various features (channel type, number of interactions, user attributes, etc.).

  • Straightforward to measure “incremental lift” if the model controls properly for other factors.

  • Good predictive power if enough data is available.

Limitations:

  • Still observational, so potential for confounding.

  • Must carefully design features and possibly add interactions if using simpler methods.

  • Some ML methods can be “black boxes” unless you use interpretability techniques.

Mathematical Intuition:
The model \( f(\cdot) \) estimates \( P(\text{conversion} \mid \text{channels}) \). Attribution is often derived by comparing \( f \) with and without a certain channel, or by analyzing the learned parameters (in logistic regression) or feature importance (in tree ensembles).

Industry usage: Google’s early data-driven attribution featured a bagged logistic regression approach [12]. Adobe’s “Algorithmic Attribution” also uses logistic/econometric methods [11]. Many advanced in-house teams build custom solutions with random forests or XGBoost, especially if they have large volumes of user-level data.


Incrementality testing and lift modeling

Description: Incrementality testing directly measures causal lift via experiments (e.g., user-level holdouts or geo-based tests) [13, 14]. By randomly withholding ads or turning them off in certain regions, one can compare conversion rates in treatment vs. control. The difference (or “lift”) is the incremental effect of that channel.

Use cases and benefits:

  • Gold standard for causation—randomization avoids many biases.

  • Particularly important for channels like branded paid search or retargeting, where observational data might over-attribute conversions.

  • Ideal for calibrating or validating other attribution models.

Limitations:

  • Costly: you have to hold out real prospects or regions from marketing, which can mean lost revenue.

  • Some channels (like organic search or national TV) are difficult to randomize.

  • Results apply only to the tested budget/time period and may not generalize perfectly.

Mathematical intuition: A straightforward difference in conversion between treatment and control:

\[ \text{Lift} = P(\text{conversion} \mid \text{treatment}) - P(\text{conversion} \mid \text{control}) \]

Statistical inference (e.g., t-tests, Bayesian methods) ensures the difference is significant.

Industry usage: Large advertisers like Facebook, Google, and Wayfair routinely advocate or offer lift test tools [13, 14]. Many sophisticated marketing teams rely on incrementality experiments to confirm whether a channel truly drives net-new conversions.


Multi-channel attribution using causal inference (uplift modeling & double ML)

Description: Causal inference methods attempt to estimate each channel’s causal effect from observational data, correcting for confounders.

  • Uplift Modeling: Learns the difference in conversion probability if a user is “treated” (exposed to a channel) vs. if not. Often uses separate models for treated/control or specialized “meta-learners” (T-learner, X-learner, etc.) [15].

  • Double Machine Learning (DML): Uses machine learning to flexibly model both the outcome and the treatment assignment, then obtains a “debiased” effect of the channel [16].

Use cases and benefits:

  • Seeks causal effects without requiring continuous or universal experimentation.

  • Identifies which segments truly benefit from exposure.

  • Potentially more accurate than naive correlation-based models if assumptions hold.

Limitations:

  • Requires strong assumptions about unobserved confounders, or partial randomization.

  • Implementation is more complex; results can be sensitive to model specifications.

  • Handling multiple channels as multiple “treatments” can demand large datasets.

Mathematical intuition: In uplift modeling, we estimate:

\[ \text{Uplift}(x) = P(Y = 1 \mid T = 1, x) - P(Y = 1 \mid T = 0, x), \]

where \( T \) is treatment (channel exposure) and \( x \) is a user’s features. Double ML uses partialling-out or residual-on-residual regressions to remove confounding influences.

Industry usage: Growing interest in advanced analytics teams at larger tech or retail firms. Tools like Microsoft’s EconML library implement these methods, but adoption remains niche due to complexity.


Customer journey-based deep learning models

Description: Deep learning approaches use neural networks (RNNs, LSTMs, Transformers with attention) to process the entire sequence of touchpoints and predict conversion [3, 17, 18]. The model’s learned parameters (e.g., attention weights) or post-hoc interpretability methods (like integrated gradients) can then yield attribution scores.

Use cases and benefits:

  • Potentially captures nonlinear interactions, temporal patterns, and synergy among channels without manually specifying them.

  • Attention layers can offer a built-in mechanism to see which touchpoints matter most.

  • Can outperform simpler models in large-scale data scenarios.

Limitations:

  • Data-hungry; requires large volumes of user journeys.

  • Typically black box from a causal perspective: improved predictive accuracy does not guarantee correct causal attributions.

  • Interpretability, training complexity, and overfitting risk remain challenges.

Mathematical Intuition:
A deep network \( f \) learns \( P(\text{conversion} \mid \text{sequence of touches}) \). In attention-based models, a set of weights \( \alpha_i \) is produced for each touchpoint \( i \), which can be interpreted as that touchpoint’s contribution to the final prediction.

Industry usage: Still more common in research or at highly data-driven enterprises (e.g., large e-commerce or ad tech). Not a standard offering in most out-of-the-box attribution tools, but it is likely to gain traction as AI methods mature.


Conclusion (if you don't question the premises of these models)

Digital marketing attribution has come a long way, from simplistic heuristics (first-touch or last-touch) to more sophisticated, data-driven approaches (Markov chains, Shapley value) and beyond. Each model offers a different perspective on which interactions matter most and why. Rule-based methods remain popular for simplicity and clarity, while data-driven approaches attempt to learn and reflect the true influence of each channel.

Incrementality experiments (lift testing) and causal inference models address the central challenge of causation vs. correlation, increasingly becoming the gold standard in advanced marketing analytics. Meanwhile, deep learning promises to capture rich, complex patterns in multi-channel journeys, though it often lacks the causal guarantees of experimental methods.

No single model is universally “best.” Savvy organizations typically combine or compare multiple methods, validating their assumptions with controlled experiments. As privacy changes and new channels emerge, the future of attribution likely involves hybrid solutions: real-time statistical and machine learning methods guided by periodic experimental calibrations.

Request a demo

Statsig's experts are on standby to answer any questions about experimentation at your organization.
request a demo cta image


References

  1. Molloy, J. (2023). What Is Marketing Attribution? A Complete Guide. Corvidae – History of attribution from MMMs to multi-touch to AI.

  2. QueryClick (2021). A Comparison of Attribution Models: Shapley and Markov vs. Corvidae – Discussion on limitations of simple models and intro to Shapley & Markov.

  3. Zweig, J. (2023). Multi-Touch Attribution: From Traditional Models to Deep Learning Approaches. – Covers formulas for last click, first click, linear, plus an introduction to LSTM and transformer approaches.

  4. Strong.io – Explanation of First-Touch and Last-Touch.

  5. Adobe Blog (2017). Algorithmic Attribution: Choosing the Model Right for Your Company. – Commentary on how last-click ignores earlier channel interactions.

  6. Workamajig. How To Choose A Proven Marketing Attribution Model. – Describes the 7-day half-life approach in time decay.

  7. Dreamdata Documentation. Types of Attribution Models. – Explains U-Shaped and W-Shaped position-based splits (e.g., 40-40-20 for U-Shaped, 30-30-30-10 for W-Shaped).

  8. ChannelMix (2021). Markov Chains for Marketers: A Quick-Start Guide. – Introduction to building Markov chain transition models for marketing.

  9. Adequate Digital. Markov Chain Attribution Modeling [Complete Guide]. – Detailed look at computational aspects, bias, and pitfalls in Markov-based models.

  10. SegmentStream Blog. Marketing Attribution Models Explained. – Overview of Shapley Value and game-theoretic multi-touch attribution.

  11. Adobe Blog (2017). Algorithmic Attribution: Choosing the Attribution Model That’s Right for Your Company. – Describes logistic regression (econometric) modeling.

  12. Shao, X. and Li, L. (2011). Data-Driven Multi-Touch Attribution Models. Google – Introduces bagged logistic regression and conditional probability methods for attribution.

  13. Wayfair Tech Blog (2023). How Wayfair Uses Geo Experiments to Measure Incrementality. – Explains geo-based holdouts to test marketing lift.

  14. Leavened (2023). Measuring Incremental Lift with Attribution Models. – Defines incremental lift and how it differs from naive attributions.

  15. Mosca, A. (2023). Uplift Modeling: Predict the Causal Effect of Marketing Communications. Medium – Explains how to estimate incremental effect at the user level.

  16. Kögel, H. (2023). Causal Machine Learning in Marketing. Medium – Introduction to Double Machine Learning (DML) for unbiased effect estimation.

  17. Li, N. et al. (2018). Deep Neural Net with Attention for Multi-channel Multi-touch Attribution. arXiv – Attention-based approach to attribute each touch in a sequence.

  18. Strong.io Blog (2023). Modern Attribution Models: Deep Learning Approaches (LSTM & Transformer). – Demonstrates how RNNs and attention can model complex journeys.

Recent Posts

We use cookies to ensure you get the best experience on our website.
Privacy Policy