Why Your Fundraising Test Probably “Won” by Luck & What to Do About It

August 25, 2025 Kevin Schulman, Founder, DonorVoice and DVCanvass

It’s possible — maybe even likely — that your “winner” in a point-in-time fundraising A/B test didn’t win because of your headline, photo, or copy change. It probably won because of random noise.

Before you yell at your screen that I’m an idiot (maybe true, but not for this reason), walk with me through the logic and the math.

Very little giving happens without an ask. Stop soliciting your donors and giving decays fast toward zero. That means the baseline effect of simply showing up, asking, is massive. But once that baseline is in place, the incremental effect of a single campaign change is usually tiny. Most gifts are a maintenance effect: you keep asking, they keep giving. Swapping a subject line or image rarely moves the needle in a way that breaks through the noise.

The noise in your data

When I hear the word noise I tend to think it’s random but that’s not accurate. It isn’t “random in the universe,” it’s random to me and you. It’s things you don’t control but that swamp out the impact of a lot of A/B testing:

Giving cadence habits: Some donors give once a year, no matter what.
Income and cash flow: Bonuses, bills, unexpected expenses.
Life events: Moves, births, deaths, illnesses.
Mental bandwidth: Calm Saturday mornings vs. hectic weekday evenings.
Conflicting priorities: Another charity beat you to their wallet this week.

These forces are endogenous to your appeal schedule, they interact with when and how you ask, but they aren’t influenced by your photo swap or envelope color. They’re large, structural forces, and they overwhelm incremental testing ideas.

Let’s ground this in a simple acquisition response-rate example:

Two test cells of 20,000 names each
Control response rate: 0.5% → about 100 responders
Standard two-sample proportion test, 95% confidence, 80% power

What lift do you actually need to detect a real effect?

Treatment needed: about 0.72% response, ≈ 144 responders
Absolute lift: +0.22 percentage points (0.50% → 0.72%)
Relative lift: +43%
Incremental responders: about +44 in treatment vs. control

Plain English: With 20k names per cell and a 0.5% baseline, you need roughly 44 more gifts in treatment (100 → 144) to be confident it wasn’t luck. A 5–10% “bump” from a copy tweak or image swap won’t clear that bar.

Now, here’s the part the textbook leaves out and why real life is harsher: In a tidy model, every donor is like an independent coin flip with the same probability of responding. Real life isn’t tidy. Responses cluster because circumstances cluster:

One drop hits early in the week; another bunch lands Friday.
Some list sources are “hotter” than others.
Some weeks, the physical mailbox is crowded with competing appeals; other weeks it’s quiet.

This clustering means results bounce around more than the clean model predicts—statisticians call that overdispersion. Practically, it shrinks your effective sample size, so the lift you need to detect a real effect gets even larger than the +43% above.

And zooming out one level further: overdispersion is just the statistical footprint of bigger, structural noise—income timing, life events, competing priorities, attention—that you don’t observe but that swamps most incremental testing.

Even if your p-value says “significant,” in this environment small apparent wins are often just the world’s messiness masquerading as your genius.

Why “statistical significance” isn’t enough

Most fundraisers lean hard on p-values. But a significant result at 95% confidence doesn’t mean your creative tweak truly caused the lift. Here’s why:

Fragility at low base rates: A handful of donors shifting groups can swing the outcome from “win” to “loss.”
Variance in amounts: One $5,000 donor randomly in treatment makes it “win” on revenue. That’s noise, not signal.
Multiple testing: Run enough A/Bs and by definition some will show p < .05 just by chance.

In other words, stat sig doesn’t rescue you from noise when the noise is this big and the base rate this small.

So what should you do?

Separate maintenance from incremental effects.
Run suppression tests occasionally (intentionally don’t solicit a random slice or turn a channel spend to $0) to measure how much giving disappears without an ask. That’s your baseline, everything else is incremental.
Aggregate results.
Don’t hang your hat on a single test. Pool results across campaigns or over time. Replication smooths out noise and lets small true effects emerge.
Reduce variance before testing.
This feels counterintuitive — marketers usually like variance because it gives them “something to explain.” But in testing, variance is the enemy. Mixing $25 donors and $2,500 donors in one cell doesn’t give you more insight; it just adds noise that makes modest lifts undetectable. Segmenting into more homogeneous groups reduces that noise, which makes any real lift easier to see. Then you can compare results across groups to understand where effects are universal versus group-specific.

Because randomness is always in the room with you, the smart move is to design for it, not be fooled by it.

Kevin

Feedback

Ask A Behavioral Scientist

Behavioral Science Q & A

Q:We are struggling with acquistion. During our biggest community campaign, a colleague is suggesting that we have a QR code directing donors to a donate page that does not capture donor information – just a donation and an email address. We won’t be able to post any of these new doors our lvoely newsletters, or thank you letters. We’ll likely never hear from them again. What’s the best method to get this team to see the importance about a donor vs a donation?

Thanks so much for raising this. Yes, capturing donor information can be helpful for stewardship like newsletters, thank-you letters, impact updates. But how you ask matters. Forcing full data capture introduces friction that can significantly depress conversion, many donors may simply abandon the process. Beyond the friction itself, required fields also shift the emotional experience […]

Read Full Answer

Q: Should we include “Giving Tuesday” in the subject lines for the emails that are going out before Giving Tuesday?

Unlike holidays that everyone already knows, Giving Tuesday is a created event. Many donors recognize the name but not the exact timing, so referencing it becomes a helpful cue. It serves as a reminder and taps into social norm activation (“everyone’s giving today”), which boosts response. However, we still want it paired with the mission, […]

Read Full Answer

Q: can we pull the match language into the subject lines? Or this should be an A/B test?

When a subject line leads with the match (“Your gift matched!”), it risks triggering market-norm thinking: the sense that giving is a financial transaction rather than an act rooted in values, identity, and care. This shift reduces intrinsic motivation and, over time, can weaken donor satisfaction and long-term engagement. It also makes the email indistinguishable […]

Read Full Answer

Q: Our mid-level donor team removed the QR code from the DM donation form that links to the donation page, but have left the URL for them to type it in manually. Not sure why they are adding a barrier to the donation process for a higher value donor – but I have to ask – is there any proof – either way – if a QR donation code reduces MV online giving, has any effect on their donation amount, has any effect on off line donations? Thank you….

There’s no evidence that QR codes suppress mid-value giving; all available research suggests they either help or have no negative effect. In fact, behavioral and usability research consistently shows the opposite: reducing friction at any point in the donation process increases completion rates and total response. And that has nothing to do with capacity and […]

Read Full Answer

Q: How can we effectively use behavioral science to help shift our Board’s mindset. The majority are extremely resistant to asking their networks or sharing their contact lists with us, even after a candid discussion with an external lay leader who has been training boards with her fantastic Fundraising isn’t the F Word! workshop. We have also offered to use our automated email tool to send their appeals from their own email. It is so frustrating. We even have 2 Board members and the chair trying put some accountability on them for our big event but people are not really moving!

What you’re experiencing is very common. Resistance often isn’t about capability, but about motivation quality. If board members feel pushed into fundraising, that triggers controlled motivation (low quality motivation) i.e. obligation, guilt, or fear of judgment, which often results in avoidance. Instead, we need to create conditions for volitional motivation (high quality motivation) by satisfying […]

Read Full Answer

Q: Copywriters often argue the ask should appear on the first page, but that usually breaks the story in two. With a one-sided letter the ask is always on page one, but with a two-sided letter it may fall on the second page—do results differ? Has your appeal structure been tested on both one-sided and two-sided letters? I just read the article Your Appeal Outline: Thoughtful Strategy or Random Spasm?

That’s a really thoughtful question, and you’re not the first to raise it. Many of our clients have been cautious about placing the ask at the very end. To address their concern, we’ve tested both approaches, and the results are clear: when the ask comes last, even if that means it appears on the second […]

Read Full Answer

The Agitator Tool Box

Ideas, applications, tools, processes, and case studies of break-through solutions in fundraising, including: