Laws, Sausages, and Third-Party Data
More and more fundraisers are falling in love with Big Data. Some use it to create “personas” in hopes of better segmenting their files. Others employ it for wealth screening and prospect research. Whatever use you make of it every data point should move your organization at least one step closer to the donor.
Yet despite all these digital breadcrumbs, it turns out that fundraisers might know less about individual donors than they think. But, you say, “the numbers don’t lie”—or do they?
What if much of the data is far less accurate than we expect it to be?
In my last post I talked about Kimberly Ellinger, my phantom family member who has been following me around for 15 years through bad data compounded by promiscuous list exchanges.
Some amusing commentary followed, including one reader who gets mail for Tom Hanks (and not just in the sense of owning the You’ve Got Mail DVD).
Sadly, the issue of third-party data is much broader, and worse, than I’d even made it out to be. A July report from Deloitte looked at how accurate commercially available data is. Considering the title of the report is Predictably Inaccurate, you can likely guess what’s coming, but even I, a third-party data cynic, was boggled:
- 59% of those responding to the study (these were Deloitte professionals who were given their own profiles sourced from a mainstream data broker) reported that the demographic data obtained from brokers to be 50% or less accurate — even for simple, easily available data like date of birth, marital status, and # of adults in household).
- 84% said data-broker data was 0-50% correct about their economic data. So beware that wealth append you just did.
- 75% said their vehicle data was over less than 50% correct, including 44% who said it was zero percent correct. (Consider that, in the United States, 22% of cars are white, 40% of all vehicles are SUVs, and 14% are Fords. Thus, if I guessed you drove a white Ford SUV, I’d only be completely wrong 40% of the time. Let me know if you want to hire me as a third-party data broker.)
Cars aren’t the only category where you’d want me to be your data broker. Less than 20% of people said their number of children was correct. Yet, 41% of mothers have two children, according to Pew surveys (yes, this assumes the person has children, but even correcting for this, you’d be better off guessing two rather than using a data broker).
The report is worth reading if only for some of the responses, which include these gems:
“It said I was single (I am married), I have no children (I have six), and I vote Democrat (I often vote Republican).”
“If my data is representative, this seems pretty useless.”
Most third-party data brokers are drawing from the same well, especially the data provided by the big three credit reporting agencies. So you can see exactly how accurate third-party data is for you at aboutthedata.com from Acxiom. (You do have to register, but what’s the worst that could happen in the safe hands of one of the big credit reporting agencies?) (Don’t answer that).
Mine says that I:
- Have half the children I do (should have guessed two!)
- Vote for the other political party
- Have the date of my house purchase right, but number of years in the house wrong (How?! It’s just subtraction!)
- Am in the market for female apparel (those who have seen my sartorial stylings know I don’t even spend on maleapparel)
- Don’t donate to anything but political causes (there would be a bunch of nonprofits surprised by this…)
- Like cooking magazines
- Love gardening and crafts – doing them, reading about them, buying stuff for them
All demonstrably untrue. And yet, in all too many cases, these are the things we are feeding into our models. Then GIGO – Garbage In, Garbage Out.
How did big data get so polluted? The report covers that as well:
- Outdated information that isn’t worth the cost of updating
- Incomplete information that isn’t worth the cost of completing
- Incorrectly collating multiple data sources
- Incorrect inferences (think of what happens to your Amazon recommendations when you forget to mark something as a gift)
- Incorrect models
- Corruption by malicious parties
Allow me to summarize a primary reason for this mess: lack of financial incentives to do better.
Most models are black boxes. You put your donors/constituents in, you get a score out. Easy peasy. And when we get it wrong, we don’t know because the constituent who starts getting ads in Spanish or honor/memorial solicitations (both ads I now get from nonprofits) doesn’t care enough to report back that they haven’t lost a loved one or spontaneously learned to speak in a new tongue.
So this is a peek into how the sausage is made. Turns out there is some Upton Sinclair-level stuff going on –the digital equivalent of the slaughterhouses described in The Jungle.
Don’t get me wrong. There are quality-oriented providers of good and valuable data. And expert builders of accurate models. But, it’s up to the fundraiser to understand the sources and quality of the data being added to the donor file.
In short, dine on these data sausages at your peril.
Nick
Nick,
I’m sharing this with several people (and suggesting that they become subscribers), because I often encounter what I call the “cringe factor” when clients — or clients of clients — talk about using data screening.
The best — or worst — was the health care organization that wanted to find “weallthy individuals in their area”. Oh, did I mention is was for a capital campaign?
So with an apology to Twain or Bismark. “Some mailing lists are like sausages, it is better not to see them being made.”