View Single Post
  #13 (permalink)  
Old 25-June-2008, 08:47 AM
Ivan Viehoff Ivan Viehoff is offline
Established Member
 
Join Date: Apr 2004
Location: Chalfont St. Giles, England
Posts: 769
Default

Quote:
Originally Posted by aurora View Post
Anyone that wants to know more about Data Mining, here is a pdf file of a presentation by Richard De Veaux on lessons learned in data mining through practical experience. He's a professor of math and statistics. I attended a short course he gave on Data Mining a bunch of years ago, quite interesting and an engaging speaker.
He gives a very interesting case there - he has about 1000 data points, but 149 potential explanatory variables, which may or may not include the thing that matters. Nothing stands out: the R2 of a linear plot between the data and any one of them is just 1%, likely it's some combination of things. It probably isn't even linear. It strikes me as obvious that some systematic data exploration tool will find plenty of random correlations.

In a situation where you have data but no a priori model, then there really isn't anything you can do but mine the data and see what you get. But don't believe you have discovered anything, oh no. What you then have to do is set up an experiment to test your hypothesis that really does have statistical power, because there just isn't any statistical power left in the data you already got. You can probably design a much cheaper experiment to focus explicitly on the hypothesis you have made.
Reply With Quote