|
| If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|||||||
| Register | FAQ | Members List | Calendar | Mark Forums Read |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
|
||||
|
Interesting article in 'Wired'
Quote:
__________________
What brings us together is stronger than what pulls us apart |
|
|||
|
The wider community seem unaware that to say someone is "data-mining" is not a compliment, rather it is used as an insult for those who don't know how to do nice applied statistics.
Suppose you use your computer to carry out 1000 statisical analyses. A 0.1% significance is no longer an excellent level of correlation, it is just something you were likely to find by chance doing so many runs. In fact at some point if you carry on doing runs your ability to find anything of true statistical significance has gone out the window. That's why good statistics still involves having a small number of good theoretical priors, and testing them. We already have the problem in science that publication bias means that only a proportion of papers with nice results are getting published. And those scientists probably did quite a few runs to get one with a nice result. So when you read all those papers with nice significances at the 1% level, it is a lot more than 1% that arose by chance, some have estimated nearer 30% of published medical papers have actually got chance results as a result of all this bias in favour of nice-looking results. Reminds me of that recent paper that came out saying that coffee drinking (at 5+ cups a day) is good for you, at least in the sense of reducing the risk of heart disease. Curiously, nearly all previous studies had come to the opposite conclusion. |
|
|||
|
I say let them try -- and may the best method win!
![]() Quote:
![]()
__________________
"A witty saying proves nothing" Voltaire. "All your bias are belong to us" Ara Pacis. |
|
|||
|
Now, that's a little unfair. Computer-intensive methods are obviously effective in various kinds of problems. Where I get skeptical is that every time some method makes a couple of breakthroughs, somebody always goes overboard with enthusiasm, assuming that it will replace everything else in the toolbox of science. I'll believe it when I see it.
__________________
"A witty saying proves nothing" Voltaire. "All your bias are belong to us" Ara Pacis. |
|
|||
|
Good catch, Argos!
I think Chris Anderson (author of the Wired article) has it dead wrong, but I'll have to come back later to try to explain. For now, I'll just point out that by common consent the World's Greatest Theory Ever is information theory, which in a sense identifies "information" with statistical correlation, while manifestly ignoring causality. It's odd that Anderson didn't develop his theme to support his argument by invoking the genius of Shannon. Nonetheless, I think the conclusion he suggests, that science don't need no stinking models, is dead wrong. I haven't seen the talk by Peter Norvig of Google (does anyone have a link?) but if he really said the same thing, I can only say that he doesn't understand the significance of petabyte "cloud computing", which would be very odd given his position. See Entropy on the World Wide Web for some background, including (IIRC) an archived post by myself on the "acausality" inherent in Shannon's information theory. Note also that there are other information theories.
__________________
Chris Hillman Read these PF posts. Avoid Wikipedia--- except for these versions. Read this and this suggested sticky. When asked for advice, I always say: never take advice! |
|
||||
|
In addition to the correlation is not causation issue (already mentioned) would add that Data mining is not magic. It has to be done in conjunction with expert knowledge of the field being studied. Conclusions that come out of the tool, for example, have to survive the "Laugh Test". The tool is not going to automatically solve all your problems for you.
__________________
"I'm as accurate as any psychic. And I'm a cartoon!" -- Squidward "Arrrgh, the laws of physics be a harsh mistress!" -- Bender |
|
||||
|
Anyone that wants to know more about Data Mining, here is a pdf file of a presentation by Richard De Veaux on lessons learned in data mining through practical experience. He's a professor of math and statistics. I attended a short course he gave on Data Mining a bunch of years ago, quite interesting and an engaging speaker.
For a free data mining tutorial, download this booklet from Two Crows which is a really good intro to data mining.
__________________
"I'm as accurate as any psychic. And I'm a cartoon!" -- Squidward "Arrrgh, the laws of physics be a harsh mistress!" -- Bender |
|
|||
|
I'll just mention that there is 1.4 kilos of squishy pink, glucose hungry tissue inside your skull that is mostly devoted to modeling the world. If anyone thinks we can do without models, feel free to scoop most of that stuff out. Just be sure to keep the brain stem.
|
|
|||
|
Quote:
In a situation where you have data but no a priori model, then there really isn't anything you can do but mine the data and see what you get. But don't believe you have discovered anything, oh no. What you then have to do is set up an experiment to test your hypothesis that really does have statistical power, because there just isn't any statistical power left in the data you already got. You can probably design a much cheaper experiment to focus explicitly on the hypothesis you have made. |
|
||||
|
Quote:
The use of cloud computing to assist in the manipulation of vast amounts of data to provide statistically useful correlations is undoubtedly a useful tool in the box, and will prove it's worth in areas of science which are currently hamstrung by data volume and interaction complexity such as climate science and molecular biology. Perhaps it will be useful in analysing data from atom smashers too, and the possibility of devising new types of experiments which could take advantage of the number crunching power of the cloud shouldn't be dismissed. However, it's not going to magically provide an elegant simplified theory adequate to the data all by itself, gifted humans are still better at integration than the fastest biggest computer. Seems to me that the problem with Occans razor is that using it to trim out conceptual variables at one stage may lead to more complexity at a later stage than you would have had if you'd kept an apparently unecessary and extraneous variable at the earlier stage. The only way to find out is to go back and 'rerun' the development of theory including the dismissed variable to see if it comes into play and assists in finding a simpler solution further down the line. There is an understandable reluctabce to do this when much is already invested in the current model, even when it becomes apparent that an impasse is looming and the difficulties of experiment design and cost increase exponentially. The most useful tool in the box is the pink squishy stuff and the fact that it can be used differently by everybody. Being able to bring it to bear on conceptual problems from many different angles by many people able to communicate is the human cloud. |
|
||||
|
Quote:
Monkeys at typewriters aren't Shakespeare. BTW, I ordered a trial subscription to "Wired" when it first came out. One perusal of the contents of its first issue showed it was glossy garbage and I promptly canceled. I see it hasn't changed.
__________________
A person's name, or a mark representing it, as signed personally or by deputy, as in subscribing a letter or other document. |
|
||||
|
Shannon and Data mining also don't mean what people seem to think they mean.
Anyway, just because a lot of people are apparently looking at Rick Astley's "Never gonna let you down" on Youtube doesn't mean that the song is good or should be played on the radio. ![]()
__________________
Si tacuisses, philosophus mansisses. "Half of what I say is meaningless, but I say it so that the other half may reach you." |
|
||||
|
Hmm, sounds like what we've been doing with computer based optimization for decades--using everything from simple Monti Carlo analysis to evolutionary computation. I don't see what is so new besides the amazing computing power now available (which will pale beside what quantum computers will be doing a decade from now).
Of course, the real problem here is good-old garbage-in garbage-out. If you don't understand the solution from a theoretical perspective, its hard to throw out nonsense. For example, if you try to have a computer empirically derive equations for the motion of the planets, you could very easily wind up with epicycles depending on the inputs and search algorithm. They'd fit the data ok but provide no scientific insight and be useless when applied to new objects.
__________________
Do try not to take me too seriously. |
|
|||
|
There are plenty of instances where technology of different forms have been in use that there was no real models around to explain, often the practical scientists/engineers/inventors could predict outcomes by looking up experimental data, interpolate from it and hammer out a few formulas by looking at patterns, without really making any models for why it worked.
It looks to me like this is just taking that method to the next level, throw more computing power, better algorithms and more data at the problem... But often when data and knowledge had been amassed by practical experiments, the theoretical scientists could use this to make models, that was then fed back to the practical side and so help the engineers make better things, and suggest new experiments that would provide more data and so on. I guess my point is that the theoretical and practical sides are complementary, sure you can do one whithout the other, but the best results is from having both...
__________________
Game over, you lose, we hope you enjoyed playing the exciting game of Thermodynamics... |
|
||||
|
I´d say we set up models in an attempt to overcome our cognitive limitations. Models are not always a faithful representation of the objective reality and they always have to be adjusted to a particular experiment. If we could see natural processes ocurring in a proper speed, deduction - which we use to make models among other things - would have a less important role in the SM.
For instance, we make a model of cloud formation, which explains and predict the formation of clouds in the sky [a very slow process to the human eye]. We deduce the cloud formation from a set of more basic information, such as moisture content of the air, wind speed, temperature and other parameters. But when you make a stop-motion movie of clouds in a given day, and puts it to run, deduction seems to be less necessary to understand the phenomenon. The movie shows the correlations between the parameters arising before our very eyes, and deduction loses some of its weight. So, I think the ability to mine datasets in a very speedy way, where correlations arise and provide answers to our questioning, is another kind of method, which could be useful in certain areas of research. So it would be fair to say that 'correlational methods' do not substitute the scientific method, for establishing correlations from data sets is in itself an experimental process - a part of the scientific method. They enhance the scientific method rather than substitute it.
__________________
What brings us together is stronger than what pulls us apart Last edited by Argos; 26-June-2008 at 07:31 PM.. Reason: Typos, style |
|
|||
|
Hah! I knew that if I just waited a few days, someone would save me the trouble of explaining a few fundamental points to Chris Anderson.
From one of the best sci journalists now writing (he's a cellular biologist IRL--- all good science writing is done by scientists, IMO): Why the cloud cannot obscure the scientific method, John Timmer, Ars Technica, June 25, 2008
__________________
Chris Hillman Read these PF posts. Avoid Wikipedia--- except for these versions. Read this and this suggested sticky. When asked for advice, I always say: never take advice! |
|
|||
|
Quote:
The foundation of modern science is Galileo Galilei’s analysis of observations to form a theory and select a specific theory over completing theories (Let’s call that the scientific method.) and his black box model technique. Scientific Method I believe most people understand and would agree what is the scientific method. There are, however, nuances to the method, as Galileo Galilei personally discovered with the resistance to his promotion of the solar centric hypothesis. The resistance to competing hypothesis is a topic in its self, as is how to effectively compare and develop competing theories. Humans are not robots. It is very difficult for someone to change their mind concerning a fundamental belief that has been in place when a person is young. Science, as some have said changes with a coffin, as those in authority leave and new people enter a field. The key is new data or the rediscovery of old data that is obviously anomalous. Anomalous facts are initially ignored until it is confirmed that they cannot be explained by observational errors or by modification of the standard mechanisms. This is why there is a natural lag between the discovery of anomalous facts and mechanism/theory change. In the field of astrophysics and cosmology more and better data is challenging existing base standard theory. Multiwave length observations of astronomical objects, more large observational platforms, and public access to large digital data bases is removing the practical barriers, which in the past blocked smaller groups or those who have a non mainstream hypothesis to investigate, from making a contribution to or possibly to overturn a fundamental component of the standard cosmological model. The posting of papers in arXiv has provide easy and quick access to new data and analysis. The papers in this field present logical arguments based on observations. There is questioning and challenging of standard theory. I would expect due to the above that there will be a major breakthrough in the connected fields of astrophysics and cosmology in the next four or five years, which will resolve the obvious anomalies and issues in the field. I would say in the field of astrophysics and cosmology, the data deluge has not made the scientific method obsolete. The Black Box Method & Problem I am not sure there is agreement in this forum or in the scientific community in general what is Galileo Galilie’s black box method. This is a separate subject and is a problem with physics. An argument can be made that the “string” or m-theory theorists are a symptom of the end of the usefulness of the black box model method. The development of 100^503 black box models seems to be more akin to alchemy rather than science. (i.e. Alchemists were intelligent. They developed specialized jargon, but they did not and could not advance science. We now that statement be a fact now because chemistry and physics has a deeper understanding of the problems the Alchemists were attempting to solve. Alchemy was a do loop that did not advance science. I would think most people understand that Newton’s model was a black box model. I am not sure what people think of Maxwell’s model. It also is a black box model. As to the latest models, GR/SR, quantum mechanics, and the “standard” particle model, I am not sure what people believe. I am not sure the “particle accelerator” experiment has advanced physics. The repetition of that specific experiment, creating more highly excited, very transient disturbances in space, appears to have not advanced fundamental physics. I would say in the field of fundamental physics the theory deluge and the type of theories in the deluge is evidence of a field in crisis. Theoretical physics has it appears to me to have moved toward the status of alchemy. The scientific writer John Horgan created the term “ironic science” as a gentler label for the practise of alchemical science. Ironic science is practised by intelligent people. There are thousands of very complex models to discuss and develop. Thousands of paper are written and read, but the development and discussion of the thousands of models will never advance fundamental physics. An example would be papers concerning parallel universes, two dimensional time, time travel, 12 dimension universes, or the 100 thousand papers published per year concerning what was formally called string theory and that is now called m-theory. An argument can be made that fundamental physics has become theoretical physics were the “theories” are not theories but rather black box models, toy models, complex mathematical entities, that have no connection with the physical world. |
|
|||
|
I think the link given by Chris Hillman serves to make the point clearly:
Quote:
If we considered (as I think will be the case in the very distant future) that science may become more and more directed to exploring the nature of this human representation of the absolute rather than trying to understand the ultimate underlying reality, who knows what kind of enquiry will be needed then - maybe patterns buried in vast amounts of data will be one way forward. But for the largely forceable future models are going to be needed as the only real practical way we can improve our knowledge and make efficient practical sense of our entwined interaction with absolute reality rather than a misconstrued notion that they will explain the secrets of the absolute. Last edited by Len Moran; 27-June-2008 at 02:42 PM.. Reason: spelling |
|
|||
|
Quote:
__________________
"A witty saying proves nothing" Voltaire. "All your bias are belong to us" Ara Pacis. |
|
|||
|
Quote:
It would seem to me that the real path to understanding our reality (as distinct from current science which represents it as an artificial separation of subject and object) is to try and understand and study this inseparable entwining we have with mind independent reality as opposed to attempting to scientifically understand the absolute as an entity in it's self. But how we go about that and what sort of science it would be I have really no idea. So it is not that we would choose to divorce ourselves from studying scientifically the nature of absolute reality, we have (to my way of thinking) no choice in the matter. But I think physics is a very long way from admitting to this possibility, and in any event, there is still much to be discovered and put to use in terms of the way science is practiced - I just don't think this approach is going to lead us to answering the most profound questions regarding the nature of reality. |
|
||||
|
Quote:
In that light, understanding the role that data mining can play in science is basically the same as understanding how to pose the right kinds of questions that data mining is good at answering, questions like "what correlates with what". If we instead approach the issue by first asking "the most profound questions regarding the nature of reality", we have no way to precondition just what constitutes such a question, and there is certainly no guarantee that data mining, or even science in general, will be relevant to answering it.
__________________
Logic is the grammar of truth. Meaning and absolute certainty are incompatible, and profound meaning and absolute certainty are profoundly incompatible. The only thing intelligence is capable of is recognizing itself. |
|
|||
|
Has anyone read John Horgan’s book “The End of Science”?
Horgan is an ex-Scientific America editor and has wide access to scientists in different fields. In his book, Horgan interviews some of the key scientists in each field asking questions to determine the state of the field in question. Biology For biology, Horgan notes that the scientists in that field are resolving issues and reaching fundamental dominance of the field. In the case of biology the facts that lead to resolution of the fundamental issues are directly observable. There are no hidden variables. Data, human analysis, and computer analytical techniques have result in biological models that very closely match reality. Horgan hypothesizes that in a hundred years the biological scientists will have complete dominance of their field, all of the key fundamental questions can be answered, the field then moves toward an engineering field rather than pure science. “Theoretical” Physics The following is a quote from Horgan’s book concerning his interview with physicist Sheldon Glashow. Glashow was once a leader in the search for a unified theory. Quote:
Other books that address the crisis in theoretical physics are "The End of Physics, the myth of a Unified Theory" by David Lindley or "Not Even Wrong, the failure of String theory and the search for unity in Physical Law, by Peter Woit or "The Trouble with Physics, The rise of String theory, The fall of a science, and what comes next, by Lee Smolin. Woit's book is particularly critical of the infinite monkey, infinite number of models in the model space of the "m-theory" field. That is were I got the 100^503 models in the space of the total number of possible m-theory models. Compare the number of m-theory models 100^503 to the number of hydrogen atoms in the universe. There are roughly 4x10^79 hydrogen atoms in the observable Universe. To me it is obvious "Theoretical Physics" is in a crisis. |
|
||||
|
Quote:
__________________
Any day you wake up on "the right side of the dirt" is a good day. T. Anderson |
|
|||
|
Quote:
We have discussed before that science requires a separation of subject and object and in a sense I think this is part of the preconditioning that you talk about. At the classical level, separation seems to be inherent and gives rise to an idea that science is explaining reality as a separate entity to us as subjects - we can easily conceive of objective reality as actually being the "reality", along with space and time. But as you have pointed out so often, dig a little deeper at the classical level and it becomes clear that the subject object separation is artificial, but we carry on doing science successfully anyway. But it is successful because we have framed the questions in a way that can be answered by science, but this doesn't invalidate the discipline at all, it just places science within a category of enquiry that has domains of validity. We shouldn't expect science to be able to operate outside of these areas and thus it cannot access our reality (when thought of as an inseparable entity consisting of the collective human existence and mind independent reality). But if we frame the questions correctly we can do science with this inseparable entity by separating the subject and object artificially, stand back and then do the experiments. The results we get are not to be thought of as probing the secrets of the absolute, but they are probing the secrets of our involvement with the absolute in a very objective way. And I find this a very satisfactory way of looking at science, we don't have to become despondent at the likely failure of science to unearth the secrets of mind independent reality, instead we should embrace what science is able to achieve and will continue to achieve within its area of validity and not expect any more than this. I keep on falling into the same old trap of wanting explanations that can explain the absolute, in my earlier post I fondly imagined that perhaps data mining on a massive scale may help to unravel what I see as our inseparable involvement with mind independent reality and perhaps give some scientific ideas on what mind independent reality actually is. But as you say, I am looking for scientific answers to explain mind independent reality as if the scientific answers are the ultimate truth. But I can only get such answers if I frame the question properly, but if I do that then I have lost my original quest for an ultimate truth, I have converted it to obtain a scientific truth. So I would say the ultimate truth of mind independent reality is of a nature that can not be defined, perhaps all we should really hope to say about it is that "something" is there - and just carry on discovering very important "scientific truths". |
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Is quantum entanglement true? | Ross PK81 | Space/Astronomy Questions and Answers | 26 | 29-March-2008 01:11 AM |
| Geocentrism | Robert Tulip | Against the Mainstream | 126 | 01-March-2008 04:52 AM |
| 67 National Academies Endorse Evolution | TheBlackCat | Science and Technology | 5 | 23-June-2006 03:34 AM |
| Cosmology and Religion... | SiriMurthy | Astronomy | 52 | 29-August-2002 10:46 AM |
| The Scientific Method - Art Bell style. | Cloudy | Against the Mainstream | 4 | 30-July-2002 10:59 PM |