|
| If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|||||||
| Register | FAQ | Members List | Calendar | Mark Forums Read |
![]() |
|
|
LinkBack | Thread Tools | Search this Thread | Display Modes |
|
|||
|
What are you trying to do?
Are you (1) (a) without question that your data is generated by an exponential decay process, and merely concerned with the level of confidence you have in the parameters the regression gives you? And, if so, are you (b) sure after taking logs that the errors are iid (independently identically distributed)? ...OR... (2) are you actually uncertain whether the data is well modelled by the exponential decay process, or do you want to test the model specification? If 1 (a) and (b) yes, then a text book will provide the methods for giving you confidence intervals on the parameters, remembering that they will apply to the model you estimated, ie, in log form. In fact you can probably read off what you need from a basic stats package, eg as provided with excel, if you know what you are looking for. If 1 (a) yes but (b) no, then to use OLS (ordinary least squares, ie, bog-standard regression) you will have to estimate a different model respecified so that your errors are iid. Or find a different estimation method that matches your model. Text book may give you ideas what to do. I'm not saying this will be easy. If 1(a) no, then 2. You are a bit more on your own now. Suggest you look at a textbook for tests of model specification. Myself, I would probably start with as a first cut with a test for heteroscedasticity (can also be spelled with k). Heteroscedasticity is when the errors do not have constant variance. This can be the result either of the errors not being iid (ie, 1(b)= no), or else of model mis-specification so that the measured errors are not the true errors. btw, That's about as much as I am willing to say. I'm not going to do this for you, nor am I going to provide lessons in understanding the above. |
|
||||
|
I have a better description of my problem and an idea in which direction I want to go.
Below I generated a fictional dataset: There are two datasets (data1 and data2). Both had a line fitted to them (slope and intercept shown below the table). The deviation from the fitted line is shown as resid1 and resid2 (residuals). I generated this data so the residuals would be the same in both cases, but the slope is very different. Code:
data1 resid1 data2 resid2 0.5 3.03 -0.23 2.90 -0.23 1.5 3.51 0.23 3.10 0.23 2.5 3.11 -0.18 2.44 -0.18 3.5 3.43 0.12 2.49 0.12 4.5 3.60 0.28 2.38 0.28 5.5 3.21 -0.13 1.72 -0.13 6.5 3.46 0.11 1.70 0.11 7.5 3.16 -0.20 1.14 -0.20 Slope 0.01 -0.26 R2 0.0233 0.9031 Stdev 0.206 0.206 I used the standard deviation of the residuals to describe this. But I am still kind of longing for some "official" parameter that will "talk" to me in the same way as R-square. I.e. "0.9999 veeery good fit, 1 perfect fit, 0.02 very poor fit". Thanks
__________________
Life is unfair. But that's ok.. as long as you make sure it's unfair in your favour. -Me You don't plan sincerity. You have to make it up on the spot. -Denny Crane I never make predictions, and I never will. -some footballer |
|
|||
|
You keep talking about "goodness of fit". But to investigate it, you have to be more precise. Goodness of fit is not well-defined. There are many different kinds of badness. R2 can be high but the model can be useless, it is easy to exhibit examples of this. In specific circumstances, R2 is a useful measure of something. Also we need to be sure you are using appropriate statistical methods for your model. Hence my Qs 1 and 2.
In the particular case of your data, assuming the answers to Q 1(a) and (b) are yes and yes, then what has happened is that the confidence interval around 0.01 is much larger (in absolute terms) than 0.01. The errors (although the same size as the second example) are large in comparision to this weak relationship - they swamp it. So on this data we cannot even be confident that the slope is positive rather than negative; indeed we are not even sure that there is a relation. That is what the low r2 is (in this case, assuming my assumptions are OK) telling you. In the second data set, although the errors are the same size, the much stronger relationship (-0.26) means that the confidence interval around -0.26 is fairly small in comparison to 0.26 (absolute value). The relationship is strong enough we can pick it up, even with errors of this size in the data. So we are pretty sure that the slope is negative, there is a relationship, etc. Loadsa different stat parameters exist that answer different questions in different circumstances. But you have to frame your question accurately and describe your circumstances in order to choose one. I'm not going to do that choosing for you. I have already said more than I said I was going to. I suspect you are in fact looking for the confidence interval. Head for textbook, wikipedia, mathworld, etc. |
|
||||
|
Quote:
__________________
"I'm as accurate as any psychic. And I'm a cartoon!" -- Squidward "Arrrgh, the laws of physics be a harsh mistress!" -- Bender |
|
||||
|
Thanks for all the replies.
Ivan: I hadn't seen your reply yet, when I posted the second time. Thanks for all your input. You gave me some very good pointers (i.e. iid). I'll need some time to look into that further though. To adress your questions. I am fairly certain that a linear fit is the right way to go. In my actual data I expect exponential decay, when fitting the line I use the log of the values which (if exponential decay is correct) is a straight line. Quote:
The problem is that in some cases the decay is relatively slow. A half life calculation done on a line with smaller slope is of course less accurate, but to show that my data is actually close to a straight line, I would like to have some value to present basically showing that the R2 is low not because I screwed up some samples but because the slope is small. Quote:
geonuc, no worries. It made me double check, which is always a good idea. It's bedtime for me. I'll check back tomorrow again. Thanks again for all the help.
__________________
Life is unfair. But that's ok.. as long as you make sure it's unfair in your favour. -Me You don't plan sincerity. You have to make it up on the spot. -Denny Crane I never make predictions, and I never will. -some footballer |
|
||||
|
Blitzak! I thought this thread would be about Star Wars! Y'know, R2 and all...
__________________
"If you think the LHC will create black holes, you might as well believe Hobbits are at the bottom of your garden."- Dr. Mike Inglis Rovers forever! - ToSeek "Carl Sagan sent a message to ET, Neil Armstrong walked in the Sea of Tranquility Steve Squyers built Spirit and Opportunity Dan Haylen upchucked in zero gravity." -Brent Simon, The Space Camp Song |
|
||||
|
It's more like Var Wars.
Unfortunately I'm not quite sure how to use the force.
__________________
Life is unfair. But that's ok.. as long as you make sure it's unfair in your favour. -Me You don't plan sincerity. You have to make it up on the spot. -Denny Crane I never make predictions, and I never will. -some footballer |
|
||||
|
Quote:
See this page for more on the problem and possible other solutions.
__________________
And the "driving on the freeway on a scooter" analogy still holds true because the pilots are sitting in 7 to 30 ton aircraft o' doom and you are running around them in your very own Meatbody, Mark I. Beep, beep. Big Don Trying to make sense of computers, The Error Log.
|
|
|||
|
I don't think you used the word "confidence interval" in that in a helpful way, but never mind.
But good on you for finding an explicit case of what might be one problem. If your errors are initially iid and you take logs then the errors aren't iid any more. So "take logs and use OLS" can produce a very misleading, possibly nonsense, result, in that case. But in certain kinds of data it can be more plausible that the errors are iid after you take logs. So in that case, "take logs and use OLS" is fine, indeed best. But you need to know which. It is a question of understand your model, including the error process. But I think there rae other issues too that can result in this being totally the wrong way to do it. What concerns me about modelling an exponential decay process is not that there are measurement errors (which there probably are), but that the decay process itself is random. Suppose we have a radioactive source, a small quantity of fast decaying stuff. If by chance rather more than expected amount decays in one period, as it can, then the decay process in effect restarts with a rather smaller quantity of material that is left than was expected. So if our data is measured as "the amount left" or "the amount seen to decay", it is nearly always going to be below the initially expected trend after this point. We need to use special methods to model this kind of process. This isn't just theoretic stuff. This kind of issue occurs all the time. I've had a real life example in business modelling where the failure to recognise that errors were not iid produced a completely different estimate of what were typical cost overruns for projects of different sizes, once the data had been transformed to give a reasonable assumption of iid errors. |
|
|||
|
Quote:
The problem is not with the independence of the errors, but rather with their distribution. The confidence intervals are based on the assumption that the errors are normally distributed. But if the raw errors are normal, then their logs are no longer normal. And vice-versa: if the logs are normal, then the "best" confidence interval for the log-errors does not correspond to the best confidence internal for the raw errors, in general.
__________________
"All your bias are belong to us." Ara Pacis "A witty saying proves nothing." Voltaire |
|
|||
|
The second i in iid is "identically". You are correct, the problem is with the identically bit, not the independently bit. I had no intention of implying otherwise.
You are right that changing normal to log normal is a problem. But usually a bigger problem is that after you take logs the variances become quite different form one data point to the next. |
|
||||
|
Thanks for all the help, sorry for taking so long with my reply.
Ivan: It isn't actually radioactive decay I'm working with. Would the randomness still be a problem if all I want to do is calculating the half life? I went through my data again and recalculated some of the statistical values based on some of the input here. The problem I have now might be a bit easier to explain (hopefully). - I have exponential decay data, from experiments under two conditions (5 repetitions for each condition). - For each experiment I calculate the half life based on the fitted curve. - With a t-test I can check whether the half lives are different under the different conditions. The problem I am thinking about now is, that the t-test is using the absolute values and is not taking into account the confidence interval for each individual value. Is there a commonly used method for this? It's not like my data look all that random, my half life calculations have R-squares of at least .86 (AVG .97). If I directly use the half lives in the t-test I get a significant difference between the two conditions. I was thinking of taking the lowest probable half lives (calculated from the adge of the 95% confidence interval) for the slower condition and t-testing those against the highest probable half lives for the faster condition. But there are a whole bunch of issues with that method. Thanks
__________________
Life is unfair. But that's ok.. as long as you make sure it's unfair in your favour. -Me You don't plan sincerity. You have to make it up on the spot. -Denny Crane I never make predictions, and I never will. -some footballer |
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Creationism and a "rate of star formation" question | Robert Carnegie | Questions and Answers | 32 | 29-September-2007 11:34 PM |
| Test your intelligence | Titana | Off-Topic Babbling | 184 | 20-January-2007 09:23 PM |
| Apollo tracking stations - very specific amplifier question | Nicolas | Space Exploration | 17 | 20-March-2006 09:28 PM |
| Humans, Woo-Woos, and the Reality of Statistics | genebujold | Off-Topic Babbling | 1 | 01-October-2005 10:47 PM |
| A question for Arthur C Clarke | The Watcher | Astronomy | 9 | 27-February-2004 01:34 AM |