The Curse of Testing Illiteracy

October 26, 2016 Roger Craver

Spurred on by my post The Curse of Fundraising Innumeracy, reader Mikaela King over at the National Geographic Society decided to “dog pile” on with what she termed “another illiteracy” in our sector — testing illiteracy.

Mikaela noted, “A lack of discipline in conducting accurate A/B split testing, truly ensuring randomized segments, making sure your test segments are large enough to ensure to statistical significance, only testing one element at a time (unless you’re intentionally testing a completely different offer), holding all other factors constant, only calling a test after it’s achieved statistical significance and only extrapolating the conclusions that were proven by the test. We have to “bootstrap” a lot in our industry to meet our budgets and grow our programs, but if I hear one more time about some amazing test results, only to see later that the test was fatally flawed and the results are unreliable…”

Of course she’s right. So I might as well tackle this “curse” next.

Let’s face it. Every one talks about ‘testing’. Few fundraisers and their consultants really understand what true testing means and how to conduct it.

As I noted in an earlier post, The Idiocy of Testing, one of the great barriers to growth in our sector is that despite the countless thousands of hours and millions of dollars spent on so-called ‘testing’, the result is navel-gazing at best and months or years of time wasted at worst. Years that could have/should have resulted in breakthroughs and growth.

Instead, most of the testing I’ve seen in the nonprofit sector is worthless, yielding little or zero by way of insights and producing zilch when it comes to sustained change. Oh sure, there are lots of one-off ‘winners’, but it is a temporary, fleeting success. One winner for every 4, 5…20 losers; one step forward, two back, at best. The act of treading water. There is no sustained impact on net, head count or growth of any sort.

In recent years The Agitator has done a two-part series on Direct Mail Testing for Acquisition (Part 1 here and Part 2 here.)

Two years ago, in a post titled Direct Mail Testing to Nowhere, we once more warned that while the logic of the simple A/B test is sound, it is incredibly inefficient and unproductive. Given the usual manner in which this type of testing is conducted, it’s slow, painstaking and amounts to little more than a nudge forward. The affliction of massive incrementalism.

So, let’s try again.

Assuming you’re testing with growth and breakthroughs as your goal and not just going through the motions, what is the proper way to test?

Even more to the point, how do we break the all-too-common pattern of timid take-little-risk that infects both agencies and nonprofits? An infection that ends up with testing only the marginal and incremental. Orange vs. blue envelopes … this letter signer vs. that letter signer … $25 vs. $45 … and sizes of envelopes. Incrementalism to nowhere.

QUESTION: How do we conduct testing that is truly strategic and purposeful rather than habitual?

ANSWER: With discipline in the form of a proper plan and by meticulously following proper guidelines and methodologies for each test.

I’ve taken the Testing Plan and Protocol used by our sister company DonorVoice as a real-life example of proper testing. Feel free to copy it. More importantly, please use it. (In fact you might ask your consultant or agency to show you the process they use and compare notes.)

First let’s start with an illustration of the Worksheet DonorVoice uses for putting together each and every test they’re involved with. [Click to enlarge.]

This Testing Worksheet/Planning tool is used by DonorVoice in conjunction with the following 10-point framework or protocol. Kevin Schulman, CEO of DonorVoice says: “This testing protocol will lead to far fewer and more meaningful tests (a big plus), and more definitive decision-making regarding outcomes (another big plus).”

1) Allocate 25% of your acquisition and house file budget to testing.

2) Of the 25%, put 10% into incremental and 15% into big ideas.

An important corollary here: some of this money should go into researching ideas or paying others to do it. You can even use the online environment to pre-vet ideas with small, quick tests of the ideas to gather data.

3) Set guidelines for expected improvement.

Any idea for incremental testing must deliver a 5% (or better) improvement in house results, and 10% in acquisition (we’ll see why the difference in minute). Any idea considered “breakthrough” must deliver a 20% increase (or better).

4) Any idea – incremental or breakthrough – must have a ‘reason to believe’ case made that relies on theory of how people make decisions, publicly available experimental test results, or past client test results.

The ‘reason to believe’ must include whether the idea is designed to improve response or average gift or both – this will be the metric(s) on which performance is evaluated.

A major part of this protocol is guided by the view that far more time should be spent on generation of test ideas and therefore, creating the necessary ‘rules’ and incentives to create this outcome.

This may very well result in 3 to 5 tests per year. If they are well conceived and vetted, that is a great outcome.

5) Determine test volume with math, not arbitrary, ‘best practice’ test panels of 25,000 (or whatever).

Use one of many web-based calculators (and underlying, simple statistical formulas). Here is one DonorVoice likes, but there are plenty – all free.

An acquisition example: if our control response rate is 1% and we want to be able to flag a 5% improvement – i.e. response rate greater than 1.05% – to say it is real, the test size would need to be 626,231 (at 80% power and 95% confidence and 2-tail test). That 626,231 is not a typo.

How many acquisition test panels have been used in the history of nonprofit DM that are producing meaningless results because of all the statistical noise? A sizeable majority, at least.

6) Do not create a ‘random nth’ control panel that matches the test cell size for comparison.

I don’t know how many nonprofits and agencies employ this approach but it can lead to drawing the exact wrong answer on whether the test lost or won.

The problem with the ‘random nth’ control test panel of equal size to the test – e.g. two panels drawn with random nth at 25,000 each – is that this creates a point of comparison that has its own statistical noise and far more than the main control with all the volume on it. There are a few retorts or excuses that have surfaced in defense of this practice, but they are simply off-base.

7) Determine winners and losers with math, not eyeballing it.

Use one of many web-based calculators to input test and control performance and statistically declare a winner or loser. Again, here’s DonorVoice’s free choice.

8) Declare a test a winner or loser.

Add results to the ‘reason to believe’ document; maintain a searchable archive.

9) All winners go full volume rollout.

10) Losers can be resurfaced and changed with a revised ‘reason to believe’ case.

Denny Hatch, one of the best copywriters and direct mail veterans in the business and editor of Business Common Sense, reminds us of the late Ed Mayer’s admonition: “Don’t test whispers.” Meaning, small, incremental changes (‘whispers’) produce only incremental results not worth whispering about, let alone shouting.

Whether up or down, tiny changes hardly matter and they cost lots of time and money. So, put the DonorVoice testing discipline, or one as rigorous, to work for your future.

What’s your experience with testing?

Roger

P.S. In the Agitator Toolkit you’ll find the description of a highly accurate, fast and inexpensive technology for testing literally hundreds or even thousands of variables at a time. You might want to explore this. It’s why we labeled it “18 Months’ Worth of Testing in a Day.”

Here’s a short video describing how the process works.

Feedback

One response to “The Curse of Testing Illiteracy”

John Stith says:

October 31, 2016 at 5:12 pm

Love the overall summary and points! But I think your example of 626,231 might be misleading. I think the evanmiller.org sample-size calculator assumes the control panel is the same size as the test? You rightly point out that the control should be larger. And when it is, you can see significantly results with somewhat smaller test panels

Ask A Behavioral Scientist

Behavioral Science Q & A

Q: Do you have any insight on whether integrating an individual giving appeal with other comms from the charity in both appearance and messaging can uplift results? Or does the actual appeal become ‘lost’ for lack of stand-out?

Integrating an individual giving appeal with other communications from a charity can have both positive and negative effects, and the outcome largely depends on how it’s executed. Advantages of Integration Brand Consistency: Maintaining a consistent appearance and messaging across all communications can reinforce the org’s brand identity and strengthen brand recognition and trust among your […]

Read Full Answer

Q: Is there any research on response rate impact in direct mail when referring to a sustainer gift as ongoing or recurring (catching all frequencies) v. monthly or annual?

I’m not aware of any in-market tests specifically comparing recurring vs. gift frequency language. I suspect the answer might not be the same with all gift frequencies, nor with all people. It sounds like a great opportunity for you to test and find out what works for your audience. Based on the literature, here’s a couple […]

Read Full Answer

Q: A major conservation nonprofit sends me lots of mail, many of which have on the envelope “time to renew” or “2nd notice.” I find this practice deceptive, especially as I haven’t given to said organization since 1997. It must be effective or they wouldn’t do it. But is it ethical?

Based on what we know from existing data, those renewal notices can actually be pretty effective in getting people to donate. They tap into our psychology – creating a sense of urgency, reminding us of past support, and using personalization to make the message hit home. They’re playing on our natural tendencies to feel obligated […]

Read Full Answer

Q: I find it irritating when some nonprofits accept my “gift” and then ask me to cover their credit card fees separately. It feels like a practice that does nothing to help win donors and runs the risk of turning others off. Is there any data on this either way?

Interesting question. I had a quick look at the testing done on this topic. On the positive side, in all cases, over half of donors decide to cover the fee. In some cases, it goes as high as 65%. Not a negligible percentage at all. Here’s another test from iRaiser showing consistent results (see point […]

Read Full Answer

Q: What are the three most important things to consider when designing a brilliant supporter journey?

There’s just one thing to consider when designing a supporter journey: the supporter. More specifically, you need to take into account: Who the supporter is i.e. their identity, which is the reason they support this cause, and their personality, which describes the way they “see” and process the world. These will determine the kind of […]

Read Full Answer

Q: Is there any evidence that changes to the tax laws in the USA changed end of year giving behaviors? Previously, ~30% of US taxpayers filed itemized tax returns allowing them to receive a deduction for charitable giving. Tax law changed in 2017 so that now only ~10% itemize. This should mean tax year-end giving should not matter to millions of people for whom it used to matter. Is there any data evidence to support this?

I’m not an expert in this but a quick search surfaced this article on the effect of tax reforms on 2019’s charitable giving. The researchers didn’t find a reduction. Actually, they observed an “increase in charitable contributions in 2019, even with the lower tax rates and the dramatically smaller number of taxpayers who itemize their […]

Read Full Answer

The Agitator Tool Box

Ideas, applications, tools, processes, and case studies of break-through solutions in fundraising, including: