Lies, D*mn Lies, and Statistics

There are 11 types of lies. (In binary…)

When I announced my blog project on LinkedIn, I did so with the following post:

I’m a man of my word.

I’ve since completed nine hours of coursework in my M.S. in Health Data Science Program, and have learned a significant number of things in the process. However, one point that has been driven home for me over and over again is that you cannot trust statistical models. Well, that’s not exactly right. It is more accurate to say “you should not blindly trust statistical models.” Early on in one of my first classes, I put together a project and did some analysis on it that seemed to point to a specific conclusion, and it turned out that my data had wildly violated the assumption that the modeling technique makes. Not only was the conclusion invalid, the truth turned out to be exactly the opposite!

I wonder what the penalty is for a stick to your own face?

In inferential analysis, we spent a decent chunk of time covering various modeling techniques, exploring the core assumptions made, and learning how to test to see if the model outputs could be trusted. While I don’t plan (unless you really, really want) to deep dive on the various techniques, assumptions, and tests in the blog, I did want to take a moment and call this out specifically: If you don’t actually care about accuracy and are simply trying to craft facts/numbers to fit a specific agenda (- and shaaaaaame on you if that’s the case -) then it’s decently easy to purposefully manipulate statistical modeling tools to get them to produce the output you need to do that.

The insidious thing about this is that it’s not necessarily easy to tell when someone has done this. It takes knowledge, skill, and access to data to run tests. And then, explaining things to the masses as to why the statistics that support a particular worldview they happen to hold aren’t valid ends up being met with a big yawn, if not impugned motives and outright calls of intentional bias on your part. What’s even more depressing is that the average citizen can’t even (or won’t even) push back on a simple data manipulation trick like inverting a chart axis as long as the erroneous conclusion matches what they already believe.

Whoo! Look at that trend line!

As I’m writing, as promised, I am making errors. That’s part of learning, and doing it in public is what I call “learning out loud.” In my case, most errors are a result of lack of knowledge, a short-term situation which is being corrected as I gain more information and skills. But do beware of those who are driven by an agenda. It’s exceptionally easy for them to produce complicated-looking numbers and statistical models that seem to validate their point of view, and if you don’t know how to pressure-test their assumptions, it can be easy to be lulled into believing them. A good general rule of thumb — the more “clear cut” something looks and the more it lines up with what you want to be true, the less you can trust your own interpretation of it. Know how to verify, or work with people who know and can, before you allow yourself to get swallowed up by dubious statistics.

</soapbox>

Data Science, Big Data, & Cloud nerd with a focus on healthcare & a passion for making complex topics easier to understand. All thoughts are mine & mine alone.