Why data-driven marketing attribution models don't work as promised

Tue Mar 11 2025

From huge enterprises to niche e-commerce brands, every marketer grapples with the same fundamental question: Where should we invest our marketing budget for the greatest return?

Ideally, you’d like a tidy calculation that says, “Channel A accounts for 25% of conversions, Channel B for 40%, Channel C for 10%,” and so on. That’s the allure of marketing attribution: pinpointing exactly which touchpoints move the needle and which are mere distractions.

someone shouting random numbers

As digital data collection exploded over the past two decades, a wave of sophisticated, “data-driven” models—Markov Chains, Shapley Value, algorithmic multi-touch—arrived, promising to solve attribution once and for all. Yet, despite powerful mathematics and fancy dashboards, many practitioners end up disappointed. Gordon, et. al. (2022) provided evidence that non-experimental approaches do not perform well in removing known biases in the data even with a rich dataset. This article provides the qualitative reasons for why.

The problem: Evaluating marketing spend in a complex landscape

At its heart, attribution is an effort to connect marketing spend to outcomes. Digital channels seemed to bring the perfect conditions for measurement: everything from click-through rates to time-on-site to retargeting exposures could be tracked. Marketers envisioned a crystal-clear view of who clicked what, on which day, in what order, until a conversion finally happened. If you could just capture that chain of events in detail, you could reliably calculate how each step contributed to the final purchase.

As the number of channels multiplied—search ads, display, email, social, video, affiliate, in-app promotions—so did the complexity of the user journey. No single “touch” might be enough to make a user buy. Instead, that purchase might be the cumulative effect of brand impressions, remarketing ads, reviews, influencer posts, or purely offline factors. Recognizing this, vendors started offering data-driven attribution, proclaiming an objective, quantitative way to distill messy path data into actionable insights.

What data-driven models promise

Under the hood, many of these solutions rely on Markov Chain, Shapley Value, or algorithmic approaches—sometimes a blend of the three. While they differ in mathematical nuance, they share a common story:

  1. Holistic Multi-Touch Rather than attributing everything to the first or last click, these models look across the entire user journey.

  2. Mathematical Rigor Markov chains estimate conversion probabilities by simulating what happens if a channel is removed, Shapley value credits each channel for its marginal contribution across all combinations, and machine learning uses predictive algorithms or logistic regressions to uncover which channels correlate with conversions.

  3. More Accurate Decision-Making Because these approaches incorporate data on many paths (including converting and non-converting journeys), they claim to be fairer and less biased than static rules. Supposedly, you know exactly where to push or pull marketing spend for maximum impact.

  4. Granular Attribution Whereas older “media mix” methods could only talk about aggregate effects (e.g., how much TV vs. radio), data-driven systems appear to offer user-level, channel-by-channel, even touch-by-touch insights.

These features create a narrative that, with enough data, the models will tease out the hidden truth of which channels and tactics deserve credit. Stakeholders are left expecting a precise, definitive map of ROI across their entire marketing portfolio.

Where they fall short in reality

Such precise guidance remains elusive. Most teams that attempt purely data-driven attribution models eventually run into fundamental limitations that undermine the dream of “one model to rule them all.”

1. Incomplete or Noisy Data Despite digital marketing’s promise of perfect measurability, the real world is full of cracks: cross-device usage, ad blockers, missing cookies, users who convert offline after a final online touch, or marketing tactics that don’t generate trackable clicks (like word of mouth or display impressions that aren’t clicked). Markov Chains, Shapley, and ML-based models assume that if a channel’s impact exists, it’s reflected in user-path data. But in practice, that data often omits key pieces of the puzzle, skewing attributions.

2. Correlation vs. Causation Both Markov and Shapley revolve around “removing” a channel from the equation, measuring the difference in conversion probability, and assigning the delta back to that channel. This works only if you trust the historical data to represent a genuine cause-and-effect relationship. Many channels appear late in a journey simply because that’s where people are most likely to convert anyway (e.g., branded search). They get credit in these models but might not be truly driving demand.

3. Oversimplified User Journeys First-order Markov Chains assume the next step depends only on the immediate prior channel; Shapley doesn’t even consider timing (it’s about all possible subsets). Reality is more fluid—brand perception may have formed weeks earlier, influenced by experiences that never show up in analytics. Any model that lumps user behavior into a neat chain or set combination can struggle to represent long-term awareness or intangible influences, leading to an overemphasis on recently observed channels.

4. Hidden or External Factors Pricing changes, competitor moves, economic conditions, seasonal cycles—none of these are typically visible in user-level marketing data, yet they can drastically alter conversion outcomes. When a competitor’s website experiences downtime, your conversions go up, and a data-driven model might erroneously anoint your retargeting ads as the hero. If the dataset can’t capture the real cause, the model’s results look solid mathematically but don’t match the true underlying reason for the spike.

5. Unstable Allocation Shapley Value can shift dramatically when channels are correlated or when a new channel is added to the mix. Markov Chains similarly can produce unstable attributions if data is sparse on certain channel sequences. Minor changes in user behavior or new campaign launches might yield big changes in assigned credit, making the model’s guidance harder to trust or apply consistently in budgeting.

6. Experimentation Is Still Essential The best way to determine true incremental lift is through A/B tests or geo-level experiments. By design, Markov and Shapley rely on retrospective correlation, and machine learning can only guess how removing a channel might affect user behavior. True lift requires actively withholding marketing for a control group, and these big tests can be complex to run. When experimental results clash with the model’s claims, it becomes obvious that purely observational approaches can’t fully “prove” causality.

Conclusion

Despite the elegant math and intuitive dashboards, data-driven attribution models often overpromise and underdeliver when confronted with real-world complexities. They capture correlations in your existing dataset but can’t always disentangle the deeper forces driving a user’s decision. Tracking gaps, external influences, and the fundamental gap between observation and causation mean these models typically fall short of providing a definitive roadmap for marketing spend.

Yet all is not lost. Used carefully, Markov chains, Shapley value, and similar machine-learning methods can still offer directional insights about user paths, channel synergy, and potential areas of optimization—so long as you understand their blind spots. Combining these models with well-designed incrementality tests, controlled experiments, or even top-down market mix analyses can provide checks and balances. In short, there’s no single “plug and play” approach to perfect attribution. Recognizing the limitations of data-driven methods is the first step to using them wisely and focusing on the insights that truly add value.

Request a demo

Statsig's experts are on standby to answer any questions about experimentation at your organization.
request a demo cta image

Recent Posts

We use cookies to ensure you get the best experience on our website.
Privacy Policy