How’s that for counterintuitive, eh? But it’s a genuine problem, as Ars Technica explains:
The problem is that our statistical tools for evaluating the probability of error haven’t kept pace with our own successes, in the form of our ability to obtain massive data sets and perform multiple tests on them. Even given a low tolerance for error, the sheer number of tests performed ensures that some of them will produce erroneous results at random.
The problem now is that we’re rapidly expanding our ability to do tests. Various speakers pointed to data sources as diverse as gene expression chips and the Sloan Digital Sky Survey, which provide tens of thousands of individual data points to analyze. At the same time, the growth of computing power has meant that we can ask many questions of these large data sets at once, and each one of these tests increases the prospects than an error will occur in a study; as Shaffer put it, “every decision increases your error prospects.” She pointed out that dividing data into subgroups, which can often identify susceptible subpopulations, is also a decision, and increases the chances of a spurious error. Smaller populations are also more prone to random associations.
In the end, Young noted, by the time you reach 61 tests, there’s a 95 percent chance that you’ll get a significant result at random. And, let’s face it—researchers want to see a significant result, so there’s a strong, unintentional bias towards trying different tests until something pops out.
Especially when money and funding gets involved, I’m sure. There’s no conspiracy involved, just the psychic momentum of a human institution trying to maintain the status quo. A sort of collective mental flywheel, if you like; the same thing happens with political parties all the time, but they don’t have the same self-checking instinct that science does.
Between this and the rising efficacy of the placebo effect, I’ll bet it’s a weird time to be a medical practitioner… not to mention a patient.