TESTING: When A/B Tests Attack (your results)

January 4, 2018 Admin

In yesterday’s post we talked about when A/B tests show results, but all that’s really there is noise.

Today, we’ll flip that on its head: sometimes an A/B test shows no result, but there’s an important finding just below the surface.

An example of this is a great study by Karlan and Wood looking at whether emotion or education was a more potent factor in mail appeals. The study is here.

To oversimplify, the control group got an emotional appeal and a personal story about a participant in the nonprofit’s program; the test group received the same letter, plus an additional paragraph talking about the “rigorous scientific methodologies” on which the nonprofit’s program was based.

The study found the information on program effectiveness had no impact on either the likelihood of giving or amount given. So, case closed: it doesn’t matter if you talk about your program’s effectiveness or not.

But wait.

The researchers found an interesting split in the data: effectiveness data significantly harmed response among smaller (under $100) donors (.6 percentage points lower response rate) and helped response among larger ($100+, but you probably guessed that) donors (one percentage point higher response rate). With controls in place for things like household income, previous gifts, etc., the researchers were able to reject the idea that larger and smaller donors behave the same.

So what looked like “no result” actually exposed an important difference between donors. One might say (and I do) that this highlights a dichotomy in how people give: smaller gifts are heart gifts; larger gifts are head gifts. Or you can go all Kahnemann System I and System II on me, if you prefer.

This is almost certainly happening in your file as well. We’ve talked about cat people versus dog people as a simplified example of different donor identities with different wishes in the same donor file.

Let’s say you made a choice to test an all-dog-all-the-time mail piece to a full donor file of half cat people and half dog people. The cat people would reject the mail piece, the dog people would embrace it, and the result would look like no improvement as long as they rejected/embraced in equal numbers and proportions.

Identity and average gift aren’t the only fault lines along which A/B tests can hide real results. We did a test of six additional cultivation pieces to new donors for an international nonprofit. No result… until we looked at more committed donors versus less committed donors.

Those donors who were highly committed to the organization had their retention go down by nine points when they received six additional communications versus none. They said things like, “Stop convincing me; I’m already convinced.”

When we looked at low-commitment donors, the six additional communications corresponded to a 12-point increase in retention. They said things like, “I believe you do important work, but I actually don’t know you well.” The study is discussed in more detail here.

Commitment level can even impact the age-old debate over happy versus sad faces in imagery. Xiaoxia Cao and Lei Jia tested what types of faces worked best in charity ads. They found that people who were highly psychologically involved with a nonprofit wanted to see happy faces, whereas those who weren’t as involved donated more if they saw sad faces.

This is not to say you should cut your data into tiny chunks to try to get a significant result. That’s p-hacking and it’s intellectual malpractice.

That said, meaningful segmentation means that segments will behave differently. (This usually occurs around commitment, identity, or both.)

If you are looking for these types of fissures in your data, you might be able to gain some unique insights. Insights that a one-size-fits-all testing outlook would miss.

Nick

Feedback

3 responses to “TESTING: When A/B Tests Attack (your results)”

Gayle L. Gifford, ACFRE says:

January 4, 2018 at 11:26 am

Nick, This is great data to share. I am very appreciative of all the research that you have been putting out there for us to consider.
And I always wonder at what point does one study become more than one study? Perhaps you could add a bit more to how we might evaluate the applicability of single research projects or tests. I’m assuming at recommendation would be to test those for your own organization to see how they work. Though it would seem a more comprehensive understanding of the other conditions (like time of year, segment selected, ranking in high vs low loyalty, etc etc) might also be helpful.
Nick Ellinger says:

January 4, 2018 at 12:36 pm

Thanks – glad it’s helpful. When to react to these new bits of research is always a good question. A few rules of thumb I use:
– How strong is the evidence for it? Peer-reviewed pieces, while not perfect, certainly hold more interest to me than those that aren’t because they’ve gone through an extra vetting step and are usually by people who know how to craft a study. That said, within peer-reviewed pieces, I default to tests with actual fundraising results when possible. So I’d take Karlan and Wood’s study over Cao and Jia’s because the former actually mailed out mail pieces and got donations (the latter measured donation intent). For non-peer-reviewed pieces, I look at robustness of results. For example, the difference between high and low commitment donors discussed above was tested over the period of a year with strictly controlled randomized groups, so I feel good about the strength of the findings.
– How applicable are the results to you? If you are in international relief, I’d get very excited about Karlan and Wood’s results, because they tested it with Freedom From Hunger – there’s probably a lot of crossapplicability within that sector. Or if you see results from an end of the year mailing, you may feel most comfortable testing it in your end of the year mailing.
– How much an effect could it have for us? Let’s say you do stat-heavy appeals everywhere. You look at Karlan and Wood and the myriad of studies on the identifiable victim effect and see that a successful test could change the way you communicate with donors. That should likely be your priority over, say, positioning your donation amount against a hedonic good (which, while it has an effect, won’t transform your communications).
– How easy is it to test? Can I run it on my online donation page? Is there a test already running there? If the answers are yes and no respectively, then let’s put it up to test! If it will take my legal counsel signing off and an act of Congress, then the potential impact has to be pretty big to look at it.
– Does it excite you? This is so subjective that I hesitate to put it in, but the best tests I’ve seen are the ones where people can’t wait to see the results. There are 100s of studies you could draw from right now and it can be a glut. So, all other things being equal, you might as well test the ones that get you up in the morning.
Gayle L. Gifford, ACFRE says:

January 5, 2018 at 10:39 am

Thanks, Nick.

Ask A Behavioral Scientist

Behavioral Science Q & A

Q: Do you have any insight on whether integrating an individual giving appeal with other comms from the charity in both appearance and messaging can uplift results? Or does the actual appeal become ‘lost’ for lack of stand-out?

Integrating an individual giving appeal with other communications from a charity can have both positive and negative effects, and the outcome largely depends on how it’s executed. Advantages of Integration Brand Consistency: Maintaining a consistent appearance and messaging across all communications can reinforce the org’s brand identity and strengthen brand recognition and trust among your […]

Read Full Answer

Q: Is there any research on response rate impact in direct mail when referring to a sustainer gift as ongoing or recurring (catching all frequencies) v. monthly or annual?

I’m not aware of any in-market tests specifically comparing recurring vs. gift frequency language. I suspect the answer might not be the same with all gift frequencies, nor with all people. It sounds like a great opportunity for you to test and find out what works for your audience. Based on the literature, here’s a couple […]

Read Full Answer

Q: A major conservation nonprofit sends me lots of mail, many of which have on the envelope “time to renew” or “2nd notice.” I find this practice deceptive, especially as I haven’t given to said organization since 1997. It must be effective or they wouldn’t do it. But is it ethical?

Based on what we know from existing data, those renewal notices can actually be pretty effective in getting people to donate. They tap into our psychology – creating a sense of urgency, reminding us of past support, and using personalization to make the message hit home. They’re playing on our natural tendencies to feel obligated […]

Read Full Answer

Q: I find it irritating when some nonprofits accept my “gift” and then ask me to cover their credit card fees separately. It feels like a practice that does nothing to help win donors and runs the risk of turning others off. Is there any data on this either way?

Interesting question. I had a quick look at the testing done on this topic. On the positive side, in all cases, over half of donors decide to cover the fee. In some cases, it goes as high as 65%. Not a negligible percentage at all. Here’s another test from iRaiser showing consistent results (see point […]

Read Full Answer

Q: What are the three most important things to consider when designing a brilliant supporter journey?

There’s just one thing to consider when designing a supporter journey: the supporter. More specifically, you need to take into account: Who the supporter is i.e. their identity, which is the reason they support this cause, and their personality, which describes the way they “see” and process the world. These will determine the kind of […]

Read Full Answer

Q: Is there any evidence that changes to the tax laws in the USA changed end of year giving behaviors? Previously, ~30% of US taxpayers filed itemized tax returns allowing them to receive a deduction for charitable giving. Tax law changed in 2017 so that now only ~10% itemize. This should mean tax year-end giving should not matter to millions of people for whom it used to matter. Is there any data evidence to support this?

I’m not an expert in this but a quick search surfaced this article on the effect of tax reforms on 2019’s charitable giving. The researchers didn’t find a reduction. Actually, they observed an “increase in charitable contributions in 2019, even with the lower tax rates and the dramatically smaller number of taxpayers who itemize their […]

Read Full Answer

The Agitator Tool Box

Ideas, applications, tools, processes, and case studies of break-through solutions in fundraising, including: