The wider community seem unaware that to say someone is "data-mining" is not a compliment, rather it is used as an insult for those who don't know how to do nice applied statistics.
Suppose you use your computer to carry out 1000 statisical analyses. A 0.1% significance is no longer an excellent level of correlation, it is just something you were likely to find by chance doing so many runs. In fact at some point if you carry on doing runs your ability to find anything of true statistical significance has gone out the window. That's why good statistics still involves having a small number of good theoretical priors, and testing them.
We already have the problem in science that publication bias means that only a proportion of papers with nice results are getting published. And those scientists probably did quite a few runs to get one with a nice result. So when you read all those papers with nice significances at the 1% level, it is a lot more than 1% that arose by chance, some have estimated nearer 30% of published medical papers have actually got chance results as a result of all this bias in favour of nice-looking results.
Reminds me of that recent paper that came out saying that coffee drinking (at 5+ cups a day) is good for you, at least in the sense of reducing the risk of heart disease. Curiously, nearly all previous studies had come to the opposite conclusion.
|