Is Your Analytics the Eyeball or Factory Kind?

October 24, 2025      Kevin Schulman, Founder, DonorVoice and DVCanvass

You run an A/B test, the Control pulls 5% Test pulls 5.3%.  Maybe the test is better, maybe it’s just noise.

That’s the tension we live with, trying to separate signal from noise and reduce uncertainty in picking winners and losers. So ask:

If this result isn’t real, where did the noise come from?

  • Was it the process — the way we ran the test?
  • Or the outcome — donors responding unpredictably, caught in timing, inbox clutter, mood, and randomness?

I bet the answer for most fundraisers is, outcome.  After all, the testing process has probably been replicated tens, hundreds, thousands of times.

Next question, how are you analyzing your test results to know if you should bet on test winning or losing?  It’s probably one of these two,

  • Eyeball test – bigger number wins.
  • Lab/Factory testing.  If someone talks about p values, or confidence interval or says, “95% confident that this test result didn’t happen by chance analysis”, you’re factory testing, born of the 18th century and in stat land, you’re a Frequentist.

I’d wager if analysis is done, it’s of the factory variety and it’s built on this logic,

  • Each test is independent
  • We start with no history
  • The process is random; the result is fixed

None of these match our world.  The August appeal is influenced by July’s, we have tons of history and it’s a crappy bet thinking process is the noisy part.

This is why so many “significant” lifts vanish when repeated, the analysis makes a crappy bet for you because the logic and math describe a world that doesn’t exist.

What’s the Alternative?

Choose logic and math that match our world.  In stat land this is called a Bayesian view, named after a mathematician Thomas Bayes.  He believed we already know something and the test we ran today can update our existing knowledge.  Bayesians assume the process is pretty solid; the outcomes are noisy.

Bayesian thinking fits the world we live in:

  • The process isn’t random; we designed it.
  • The outcome is random; donors are.
  • History matters — use it.

A real example: Direct Mail A/B test

  • Samples: A = 4,000, B = 4,000

  • Observed response: A = 4.8% (192/4000), B = 5.3% (212/4000)

  • Observed lift: B − A = +0.5 percentage points (pp)=

Frequentist Analysis

  • Question it answers: If there were no real difference, how unusual is a gap this big?

  • Two-sample proportion z-test (pooled):

    • z ≈ 1.02, p ≈ 0.31 (not “significant” at 0.05)

  • 95% CI for lift (B − A): [−0.46 pp, +1.46 pp]

Conclusion: We can’t rule out “no true difference.” Result is inconclusive; try again or increase n.

Bayesian Analysis 

  • Control prior (A): “Controls typically run 4.6–5.0%.”
    Represent as Beta prior with mean 4.8% and “prior sample size” n₀ = 2000
    (this encodes strong history about A).

  • Test prior (B): “New creative; broad expectation 3–7%.”
    Represent as Beta prior with mean 5.0% and n₀ = 200
    (weak, skeptical, wide).

The analysis includes history:

  • Posterior simulations yield:

    • P(B > A | data, priors) ≈ 0.865 (≈ 86.5%)

    • 95% credible interval for lift (B − A): [−0.37 pp, +1.36 pp]


Conclusion: We ran an analysis that used all our history on how the control and tests tend to perform.  And we used this single test result to update our understanding of the world.   The net, net?  There’s an ~87% probability B is truly better but the lift is likely small and could be near zero. 

Forget the math, which conclusion feels more informative and useful?

In the frequentist world, the logic treats a test as if it dropped from the sky, ignoring the hundreds of others that came before it.  A Bayesian view says your past results are data too, they form a prior, an informed expectation, and this new test is another data point that updates that expectation. You’re not pretending the world resets every time you test; you’re learning across time.

This translates to more informed decision-making. If most of your tests lose, the Bayesian view keeps you honest — fewer false winners. If your baseline win rate is strong, it does the opposite: it rescues a few false losers that your old frequentist lens would have written off.

And the best part? This can all be DIY.  GPT or any LLM can do a Bayesian analysis faster and better than those dusty online statistical significance calculators that should be relegated to the fax machine museum.

Kevin