Acquisition: Direct Mail Testing – Part 2
One of the biggest pitfalls in direct mail testing is the ‘baby & the bathwater’ problem.
The problem occurs when an organization or its consultant creates and mails a test package with numerous test elements.
Or to put it in the vernacular, a whole bunch of stuff is changed.
When this happens, the results for that package are at the very best only a very crude measuring rod for performance. Why? Because the results reveal only a thumbs up or thumbs down for the entire package.
This ‘everything but the kitchen sink’ approach provides zero guidance as to whether individual components were well received (i.e., the ‘baby’) or if everything (i.e., the ‘bath water’) needs to be changed.
This happens all the time. The only alternative, which as a general rule, NEVER happens, is to deconstruct the totally new package into a series of A/B tests, with each test panel only including a single change.
But, even if this were done, it would take forever and a day to execute. That’s why so many folks simply try to read the tea leaves and infer or guess — based on ‘years of experience’ and ‘past testing’ — about why a package did poorly and what might be salvageable.
Clearly, this is a flawed process fraught with layers of personal bias and most likely a dollop of committee bias thrown in for good measure.
There is a better, empirical way. For the past 30 years product developers have used a survey-based methodology to identify and separate the baby from the bath water. This process can be done in days versus months, costs a fraction of what traditional testing costs, and like a recent client told us, is “like doing 18 months of testing in a day”. To learn more about the methodology, click here.
To illustrate how this far more time-efficient, cost-efficient process works, here is a recent example of the old-fashioned way. Client X mails a totally new package – different OE, letter format, letter copy, inserts – against the control. New package performs poorly. Money is lost. Time is lost. New package is thrown out.
The better way. The New Package is never mailed, because the client pre-tested it (along with hundreds of other package combinations) and determined that, as constructed, it would not perform well against the control. Thousands of dollars are saved, many insights are gained that would have taken years to accumulate with conventional A/B Testing, and the cost of the pre-test is more than covered.
MOST IMPORTANTLY, a new ‘baby’ or package element is potentially discovered when two components of the new package test quite well (as determined by an actual score assigned to every single element). These New Elements are now live tested in the control package and replace elements of the control that are identified as weak.
New ‘bath water’ (i.e., poor performing test elements) is unfortunately easy to create; but ‘baby elements’ (i.e., individual winning elements) are much more difficult to identify.
Are you or your consultants identifying the winning ‘babies’? How?
Roger
P.S. In these two posts on direct mail testing I’ve deliberately avoided any rants on the shortsighted, suicidal tendency of many organizations to avoid any serious testing for a variety of reasons.
The other day an Agitator reader by the name of Sara cited some of the reasons in our Comments section: “I have always thought that the reason most organizations don’t believe in testing is because of the added cost. However, I am starting to wonder if a large part of the ‘problem’ with testing is that it requires additional planning and lead-time (it can be hard enough to meet regular deadlines when a project has been held up in committee…)”
Any organization that fails to devote a substantial part of its direct response budget to disciplined testing faces a future that grows dimmer and dimmer by the day.
I often read that surveys don’t work because fundraising works on a subconcious level and people don’t know what appeals to them consciously or why. That you should trust ONLY response rates (on frequently tested packages with appropriate controls. Why would this be an exception?
Roger,
Would you consider writing some more about testing in a small organization? Small organizations don’t have the budget or often the numbers to make testing feasible and/or useful. It would be interesting to see if there’s a way to use the principles you talk about in a way that would be useful on a much smaller scale.
Thanks!
Hi, this is Kevin Schulman, founder of DonorVoice, the company referenced in Roger’s recent posts on testing. I thought I’d take the liberty of weighing in on Patti’s excellent question about survey data and how reliable it is.
Let me start with a man bites dog comment; as a survey researcher by formal training and practice I abhor most survey research. It is, too often, poorly conceived, constructed, analyzed and applied. I can say this having been guilty or some or all of these sins in the past.
That being said, when done correctly, it is extremely useful, even critical if non-profits are ever going to achieve sustainable growth on their file.
An important corollary to all this is that many (dare I say, most) of the folks in the non-profit/fundraising arena who espouse points of view on survey research are often wrong but never in doubt.
Those in the social sciences have learned a lot about how people process information, make choices and by extension, what they can answer and not answer reliably. For example, we as humans are very good at discriminating at the extremes, at expressing global preference for product/package A vs. product/package B. A good metaphor is the eye test or the “is it better this way or that way?” question posed by the opthamologist.
We are also very good (i.e. we can answer reliably) at picking the “best” and “worst” from a set of choices.
What we are lousy at (i..e. cannot answer reliably) is answering why on these global, package preferences. If asked, we will give an answer but, for any number of reasons (including it being at a subconcious level we can’t faithfully access) we don’t give reliable answers.
This methodology focuses on asking what we are good at answering and deriving (via back end analysis) that which is critical to know but not directly answerable by the respondent. It is also a methodology that comes as close to real-world as possible by presenting holistic concepts and getting a preference measure on the overall package – versus the artificial (and bad data producing) method of getting separate measures on the component parts. By creating 1000s of hypothetical combinations through experimental design we can control for and isolate the individual impact of each component part of test idea (e.g. different teaser copy on the OE, size/shape, letter copy, premiums, etc…).
The proof is of course in the pudding and the bottom line is that it works – not perfectly of course but that isn’t the measure of efficacy. What it does do is radically improve the current, often haphazard “process” of deciding (with limited resource and time) what to test by pre-identifying the most likely winning ideas, thus increasing odds of success and reducing time and expense and getting to roll out faster. It has the added and crucial benefit of providing the luxury of testing in a risk free environment – meaning ideas that would never make it into the mail stream (out of fear) get an opportunity to “win” or be considered at no great reputational or financial risk. Big and radical ideas can and do work but only if tried.
One final soap box thought on those who argue for only using response rate as the arbiter of success. When response rates are hovering around the 1% range (in acquisition) it is odd to consider the “winner” a success. A different question or consideration should be, what about the 99%, the non-responders? How does this sector start stealing market share from the commercial sector, something it has failed to do in the last 40 plus years given that the percentage of the US population who is philanthropic is unchanged over that period?
The idea that a very different product/offer from the “winning” control isn’t capable of tapping into the 99% seems shortsighted at best. Building a new product or offer takes time and money. It also takes big ideas, failing fast, iterating and improving – i.e. innovation. This methodology can help shorten the innovation cycle by allow for big new ideas among big new segments (i.e. the 99%) to be explored more quickly and inexpensively than the current, proven by incredibly slow and inefficient testing process that is largely focused on incremental change for incremental gain among the exact same but dwindling base of support.
Color me skeptical.
In the early 2000s my public TV station took part in a six-station test where focus group volunteers with predictive dialers viewed graphic and copy elements of six different acquisition mail packages — indicating their positive or negative reaction to each element.
Then these five packages were actually mailed in statistically valid quantities.
The results for each of the stations participating: the package that the focus group members hated the most performed the best.
I’ve learned in my years of experience that how people say they’ll behave (what they like), is very different from how they actually respond with a gift.
While focus groups and surveys can offer great insight, I’m not sure they should be how we make fundraising decisions. Even though it’s expensive and time consuming, nothing replaces a good testing strategy.
Anne Ibach
Director of Membership
Oregon Public Broadcasting
Anne,
I’m very familiar with the methodology you cite, I used to own a company that owned 100 such dials for exactly the type of research methodology you are describing.
That approach has some qualitative value when combined with quant and I could go on, but won’t….
Suffice to say, the approach and methodology we are using is night and day different from what you describe.
One important clarification is that we are not trying to replace live testing. What we are trying to replace is the haphazard, risk averse to the point of suffocating, inefficient and subjective “process” that is in place for deciding WHAT TO TEST. If we are compared to this “process” it is hard to imagine how anybody can argue there isn’t room for massive improvement.
I’m always keen to turn a doubter into a believer. I am willing to engage in a pay for performance project where Oregon Public Broadcasting only pays us if it works – i.e. what you mail live performs as we predict.
I am willing to wager there is significant room to optimize and improve on the member offer going out today.