TESTING: Baby Steps are for Babies

January 3, 2018 Admin

(aka The Myth of Testing One Variable at a Time)

After our earlier posts on the dangers of overly simplistic testing, you could despair of getting any legitimate test results for your file with noise in the data and large quantities required.

There is, however, good news. If you embrace larger-scale testing, you can break the famous “test only one thing at a time” rule.

Let’s say that instead of trying to get the highest response rate possible, you were trying to reach the highest point on Earth. The trick is that you don’t know where it is, how high it is, or whether you are in a place that’s already pretty high or right at sea level.

Testing one variable at a time is like going out your donor with a good set of binoculars. You climb to the highest point you can see, then look again and go to the highest point again. You repeat until you are at the top point you can possibly see.

It’s a good iterative approach that will get you higher than you were before (or verify you were high already). But if you started in Indiana, your best hope would be to get to the top of Hoosier Hill (seriously), a whopping 1257 feet high. This is a local maximum. You are “optimized,” in that you can’t get higher by doing anything nearby.

Simba, everything the light touches is our kingdom…

What you really want is a global maximum, the best you can possibly do. But unless you are in the Himalayas already, local optimization will not get you there. The odds of you being in the Himalayas, of any possible land location on Earth, are vanishingly small.

So if you want to get to Everest, local optimization – changing one variable at a time – won’t do it.

To optimize globally, you must try some different (sometimes very different) things. It would start with a large-scale hypothesis about your donors (e.g., instead of responding to this part of our mission, they will respond to a different part) and making a wholescale communication test in sufficient quantity to see if it works.

This is the equivalent of spinning the globe and putting your finger on a location. Here’s the trick: two-thirds of the Earth’s surface is covered in water. This type of global testing is risky. Even if you have good knowledge about your donors and could limit your search to just land masses, you could end up in Florida, where the highest point is 345 feet tall and where, according to Carl Hiaasen novels, you are in peril of wacky hijinks ensuing.

I digress. The point is that these large-scale risks are scary. Hoosier Hill isn’t looking too bad, especially with the budget they are looking for you to hit. But if you do end up in the Rockies, Alps, Himalayas, etc. with your big risks, then global optimization can help you go from foothills to serious peaks.

So there are some ways to mitigate your risks while still testing big ideas. One is to have a portfolio approach to your risk. Kevin Schulman recommends allocating 15% of your volume to large-scale testing and 10% to incremental improvements here. This allows you to progress up the hill you are on while you scout the terrain for other worlds to conquer.

Another significant route to the peaks is to have a clear idea of why your donors give. In earlier posts we’ve stressed the importance of learning about your donors through the onboarding process and the types of information you’d want to collect.

If you have information like the identities of your donors, their commitment levels, and what makes them committed to your organization, your risk in creating a communication that is significantly different from your usual is significantly decreased. In fact, it’s exactly the type of approach you should be taking.

Finally, there are ways to test multiple variables simultaneously before a donor sees a live communication. The Agitator Toolbox contains one such solution here.

Using a panel of your donors, advanced survey techniques, and some sophisticated statistical analysis, this tool can tell you what images, messages, and themes are worth testing in live media (and which aren’t). No wonder one user described it as “18 months worth of testing in a day.”

So, to the idea of testing one variable at a time: it’s a fine idea as far as it goes. But to get to the best communication you can possibly have, you need to take broader, higher leaps.

How is your experience with broader testing?

Nick

Feedback

4 responses to “TESTING: Baby Steps are for Babies”

Claire Axelrad says:

January 3, 2018 at 4:02 am

And I’m back to yesterday’s question: What if you are among the 90% of nonprofits with budgets under $1 million and lists of less than 10,000 names (and usually much smaller donor lists)? They truly can only test one variable. And even then, I’d guess a statistician would proclaim the results not “statistically significant.”

What’s your best advice for these folks?
erica waasdorp says:

January 3, 2018 at 10:03 am

great question Claire. I’ve learned that you need at least 100 responses to be significant. so if you can mail 10,000 names, 50/50 split, 5,000 each, you’d need a 2% response to be significant, so if the control generates 2% or higher and the test generates 2.5% , it’s a significant lift.
what’s important though is that this should then be a cross section of all names, which with small orgs is hard… the 10,000 could be including a lot of lapsed names.

I think it’s much more important to figure out what you’re testing…Never mind the color of the envelope, the teaser on the envelope.. I’d go for something that’s going to bring in more money or more donors.

so if you’re looking to generate as many new donors as possible, ask amount is the key element to test. low ask generates higher response.

if you’re looking to increase the average gift for your existing donors, again ask amount is the key element… use ask strings or specific asks.

in my 35+ years in direct response, ask strings are the number 1 items to test as they will have the highest impact on your results. and it costs virtually nothing to do production wise.
Nick Ellinger says:

January 3, 2018 at 10:42 am

Claire, I’d say the admonition not to test incremental things, but rather to focus on core things like donor identities and reasons for giving is even greater with a smaller list. Since you have a limited testing budget and ability to test, you are going to have to make changes that go beyond the incremental. Agree with Erica – the envelope isn’t going to make substantial changes. And, as mentioned yesterday, don’t be afraid to test concepts across multiple communications.

Erica, 100 responses is a great rule of thumb. If you are looking for statistical significance, I’d still recommend the confidence interval tool (http://www.evanmiller.org/ab-testing/sample-size.html) discussed yesterday, because effect size and sample size trade off (if you have a smaller effect, you are going to need a larger sample to prove it out and vice versa).

Ask strings are a great thing to test because they are simple to do and can have a big impact (and work well online, so you can set up a test, direct your Google Grants there, and wait until you have the level of verification you want). For a smaller organization, I think I’d still focus more on the who people are and why people give, in part because there are some things that are fairly proven in ask strings that you can roll out with. At the risk of self-promotion, I did a long ask string white paper (or short ask string book) available free at http://www.thedonorvoice.com/the-science-of-ask-strings/ that can help. Some things like circling (or defaulting to online) an ask amount greater than your average gift or using most recent contribution instead of highest previous contribution have enough evidence behind them that you may feel comfortable just doing them.

But ask strings also work well in combination with other tests, so you could do it simultaneously with another test; you’d just be doing four test cells of 2K instead of two test cells of 4K across communications (for example).
Claire Axelrad says:

January 4, 2018 at 3:27 am

Thanks Erica and Nick. This is useful advice. I like having a rule of thumb, and 100 is a good number. And I’m going to download the ask string book so I can share that advice. 🙂

Ask A Behavioral Scientist

Behavioral Science Q & A

Q: As a designer who works with non-profits on fundraising strategy, I see the language like the following: “Our supporters help empower every girl, ensuring she has the resources she needs.” I do not think the word “help” is useful–I think “Our supporters empower every girl, ensuring she has the resources she needs. ” is much more engaging. Thoughts?

Whether “help” is more engaging or not really depends on the framing and context. The word help can sometimes weaken the perceived agency of the supporter, making their role feel secondary rather than central (your point). On the other hand, help can also signal collaboration rather than implying full ownership of the outcome, which might […]

Read Full Answer

Q: We started offering a donor cover option last april 1. The data to date suggests this may be dampening giving.eg. those who say yes to donor cover have a lower average gift (based on analysis of 6000+ gifts). I’m wondering if those who give lower gifts feel more guilt and therefore say yes to donor cover or if the presence of donor cover is making people adjust (lower) their gift size to accommodate the extra 3%. Would love any insights you have.

Great question! Here’s how behavioral science can help unpack what might be happening: Pain of Paying: Even a small extra charge can make giving feel more transactional than emotional, potentially reducing generosity. Fairness Concerns: Some donors might perceive donor cover as a surcharge rather than a contribution to the cause. If they feel the charity […]

Read Full Answer

Q: When writing an appeal, I waffle back and forth between writing “Your gift CAN…” or “Your gift WILL…” Any studies of which of these two words is best for an appeal?

The choice between “Your gift CAN…” and “Your gift WILL…” taps into the psychological framing of certainty vs. possibility. Currently, there is no academic research directly comparing these two framings in charitable appeals. However, I suspect no framing is universally better—the outcome likely depends on your target audience and the campaign’s goal. Here are some thoughts: Certainty Framing – […]

Read Full Answer

Q: Do you have any insight on whether integrating an individual giving appeal with other comms from the charity in both appearance and messaging can uplift results? Or does the actual appeal become ‘lost’ for lack of stand-out?

Integrating an individual giving appeal with other communications from a charity can have both positive and negative effects, and the outcome largely depends on how it’s executed. Advantages of Integration Brand Consistency: Maintaining a consistent appearance and messaging across all communications can reinforce the org’s brand identity and strengthen brand recognition and trust among your […]

Read Full Answer

Q: Is there any research on response rate impact in direct mail when referring to a sustainer gift as ongoing or recurring (catching all frequencies) v. monthly or annual?

I’m not aware of any in-market tests specifically comparing recurring vs. gift frequency language. I suspect the answer might not be the same with all gift frequencies, nor with all people. It sounds like a great opportunity for you to test and find out what works for your audience. Based on the literature, here’s a couple […]

Read Full Answer

Q: A major conservation nonprofit sends me lots of mail, many of which have on the envelope “time to renew” or “2nd notice.” I find this practice deceptive, especially as I haven’t given to said organization since 1997. It must be effective or they wouldn’t do it. But is it ethical?

Based on what we know from existing data, those renewal notices can actually be pretty effective in getting people to donate. They tap into our psychology – creating a sense of urgency, reminding us of past support, and using personalization to make the message hit home. They’re playing on our natural tendencies to feel obligated […]

Read Full Answer

The Agitator Tool Box

Ideas, applications, tools, processes, and case studies of break-through solutions in fundraising, including: