You’ll never look at your A/B test the same…

August 25, 2016 Kevin Schulman, Founder, DonorVoice and DVCanvass

Back in my salad days of working for a nonprofit (which was as of three weeks ago; time moves pretty fast nowadays), I was looking back at test results of a campaign we ran with our agency partner.

We did 15 panels of 20,000 per, each with the same RFM segments in equal amount. Think of it as an A/B test, except that this was an A/B/C/D/E/F/G/H/I/J/K/L/M/N/O test. I’d generated the list request myself based on the previous year, where we’d done a test of five different things at the same time as three different other things – hence 15 test panels.

Here were the (only slightly) changed results:

What are your conclusions from this? There are a few things that jump out at me:

E2 is a strong winner largely because of the spike in average gift – could make 14% more than if we had rolled out with A2.
C as a concept had a relatively strong showing – three of the top 7 panels. It would be worth looking into what was done there to see if it can be replicated.
A, on the other hand, had three of the six worst showings and the two lowest response rates. Don’t do what you did in A again.

Here’s the trick. All of these panels were the same audience and received the same piece.

Yes, this 15-panel test was all an accident. It was the first time I’d done a data request of our database vendor, so I left the panels the same as they had been for the piece the previous year where we did actually run a five by three test.

And yet there were significant swings in response rates and average gift, just like you’d see in a regular test.

In fact, if someone told you that A2 was the control and E2 was the test, you’d be hard-pressed not to call E2 the winner and roll-out with it:

	Response rate	Average gift	Revenue per piece
Test (E2)	3.62%	$27.43	$.99
Control (A2)	3.33%	$25.93	$.86

That’s what we had done the previous year. We had run a 15-panel test, picked a “winner” and that “winner” was the new control. And the results varied by about as much as they did when everyone got the same piece in the mail.

What lessons can you take from this?

First, there is a lot of what statisticians call ‘noise’ in any test results. What does this practically mean?

You have undoubtedly deemed test panels winners that were not and, on the flip side, test panels “losers” that were not. And it’s not just you; I know from looking at this that I have as well. The vast majority of test results have no winners and losers, just noise that we mislabel as signal.

How to avoid this?

Set up your list selects correctly. Take it from someone who had to aggregate 15 test panels to get the actual results he was looking for.

But, more importantly, this can teach us about why we test. It’s easy to do an A/B test of red envelope versus blue envelope. But it won’t get you to a deeper understanding of your donors’ behaviors; it might have gotten you the same results if they all get the red envelope by mistake.

Random testing to get random results keeps you from thinking about how to maximize the value of a communication, rather than the value of a donor. It’s when you are thinking about how to treat donors and what causes them to do what they do that you can engage with strategy, build donor loyalty, and maybe, just maybe, make your donors happier with their experiences.

Failing that, however, there is still a strong case to be made for retesting previous tests, backtesting (where you look at your hypothesis using retrospective data), much larger test quantities and a general sense of skepticism about how your test results could be wrong. After all, even if something has p = .04, that still means there’s a 4% chance the result was random. And when you test many panels and test over the course of a year, you are going to have much more randomness, chance and noise mixed in with your wins and losses.

This also has lessons for having a strong hypothesis. Each idea you test should have a reason to believe it before you start. You want to have a hypothesis that says “I believe this test should [increase response rate/increase average gift/increase donor lifetime value/some combination]” because of X so that you know what you are going to measure against instead of looking at a table like mine and picking ‘winners’ based on wherever a bigger (and false, but alluringly so) number appears.

It also helps you create the test correctly. If you have a strong hypothesis, you can make sure you get the details right to support or reject it.

What you want, ideally, is a robust testing tool like our pre-testing platform, so you can roll out with a winning test before you ever mail a piece. We’d love it if you’d like to learn more here.

And you can always learn more with our free newsletter below.

Feedback

Ask A Behavioral Scientist

Behavioral Science Q & A

Q: Do you have any insight on whether integrating an individual giving appeal with other comms from the charity in both appearance and messaging can uplift results? Or does the actual appeal become ‘lost’ for lack of stand-out?

Integrating an individual giving appeal with other communications from a charity can have both positive and negative effects, and the outcome largely depends on how it’s executed. Advantages of Integration Brand Consistency: Maintaining a consistent appearance and messaging across all communications can reinforce the org’s brand identity and strengthen brand recognition and trust among your […]

Read Full Answer

Q: Is there any research on response rate impact in direct mail when referring to a sustainer gift as ongoing or recurring (catching all frequencies) v. monthly or annual?

I’m not aware of any in-market tests specifically comparing recurring vs. gift frequency language. I suspect the answer might not be the same with all gift frequencies, nor with all people. It sounds like a great opportunity for you to test and find out what works for your audience. Based on the literature, here’s a couple […]

Read Full Answer

Q: A major conservation nonprofit sends me lots of mail, many of which have on the envelope “time to renew” or “2nd notice.” I find this practice deceptive, especially as I haven’t given to said organization since 1997. It must be effective or they wouldn’t do it. But is it ethical?

Based on what we know from existing data, those renewal notices can actually be pretty effective in getting people to donate. They tap into our psychology – creating a sense of urgency, reminding us of past support, and using personalization to make the message hit home. They’re playing on our natural tendencies to feel obligated […]

Read Full Answer

Q: I find it irritating when some nonprofits accept my “gift” and then ask me to cover their credit card fees separately. It feels like a practice that does nothing to help win donors and runs the risk of turning others off. Is there any data on this either way?

Interesting question. I had a quick look at the testing done on this topic. On the positive side, in all cases, over half of donors decide to cover the fee. In some cases, it goes as high as 65%. Not a negligible percentage at all. Here’s another test from iRaiser showing consistent results (see point […]

Read Full Answer

Q: What are the three most important things to consider when designing a brilliant supporter journey?

There’s just one thing to consider when designing a supporter journey: the supporter. More specifically, you need to take into account: Who the supporter is i.e. their identity, which is the reason they support this cause, and their personality, which describes the way they “see” and process the world. These will determine the kind of […]

Read Full Answer

Q: Is there any evidence that changes to the tax laws in the USA changed end of year giving behaviors? Previously, ~30% of US taxpayers filed itemized tax returns allowing them to receive a deduction for charitable giving. Tax law changed in 2017 so that now only ~10% itemize. This should mean tax year-end giving should not matter to millions of people for whom it used to matter. Is there any data evidence to support this?

I’m not an expert in this but a quick search surfaced this article on the effect of tax reforms on 2019’s charitable giving. The researchers didn’t find a reduction. Actually, they observed an “increase in charitable contributions in 2019, even with the lower tax rates and the dramatically smaller number of taxpayers who itemize their […]

Read Full Answer

The Agitator Tool Box

Ideas, applications, tools, processes, and case studies of break-through solutions in fundraising, including: