Climbing the Mountain of Success: Why Your 3% Improvement Might Still Leave You at Base Camp
You just ran a simple, low-cost A/B test and improved something by 3%. Your average donation test went up from “meh” to “slightly less meh.” These can feel like wins that matter.
But if your goal is building the brand and growing the supporter base year over year then it might be akin to a mountaineering goal of scaling Everest and instead, finding yourself atop the tallest slide at the local park. Where I grew up, in the wilds of Southwest Virginia, the “mountain” I’d be king of is a whopping 5,730 feet tall. Now, where I live, it’s a staggering 486 feet above sea level – a toddler could probably stack enough Legos higher.
Enter Avinash Kaushik, the analytics wizard, who tells us about the time Microsoft tried to conquer the world with Surface tablets by getting them into every NFL sideline shot. Great for visibility, right? Except, oops, Surface’s market share is still chilling at a cozy 0.29%. They nailed the local maxima of great ROI on product placement deal but in the global game of “Who’s Actually Buying This?”, they’re barely on the board.
It’s like focusing on which color envelope gets you more donations. That is one-time lift in a locally optimized world. You cannot ‘scale’ that, there is no world where changing all your OE’s to that “winning” color continues to deliver. Massive diminishing returns set in. We didn’t really prepare our path up Everest, we just painted our tiny hill a different color.
The problem is compounded if your idea generation process for best teaser or envelope color has no evidence or reason to believe behind it. Then you have a needle in a haystack chance of even being locally optimized – i.e. the best darn teaser possible instead of the best you could come up with at the time.
A way to break out of this is to go back to your seventh-grade science project: start with a hypothesis. More specifically, start with a hypothesis that, if proven one way or the other, will change how you do business.
Take the Nudge-Award-winning test that showed UNHCR posting a 42% lift when it presented its symbolic gifts symmetrically (e.g., five blankets, seven blankets, nine blankets versus blankets, radiator, and stove). This is something that, while small, changes every donation form and every response device they use. And the test ideas had theory and rationale behind them – i.e. a reason to believe.
As a side note, how do you know if you have theory or rationale behind your test idea? If your test wins or loses you have a very specific answer to the “why” question afterwards. If your answer is “don’t know” or superficial, “because people prefer chartreuse”, then you don’t.
But Kaushik’s look at global optimization also begs the question of whether we’re looking at the right goals.
There’s nothing wrong, and most things are right, about setting goals over elements like response rate, average gift, and net revenue per communication. Or larger goals like hitting your net revenue budget and aspirational file size.
But all these could use a refinement like the one Kaushik recommends for the Microsoft Surface: instead of measuring brand recall, test whether people are more likely to choose the Surface (since buying is the behavior you are shooting for).
So, for example, do you really want to measure file size? Or do you want to measure file size of the donor segment with a connection to your cause, which you’ve determined is way more profitable than those donors without a cause connection.
Donors are more valuable to nonprofits than organizations are to the donors. A global maximum metric needs to be, or at least approximate, whether you are leaving your donor potential greater than you started. A “successful” mailing that nets $100,000 isn’t successful if it turns off donors with lifetime values of $200,000.
Kevin