Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fitted_interval_preds_plot #7

Open
mgm-cincy-epa opened this issue Dec 16, 2024 · 7 comments
Open

fitted_interval_preds_plot #7

mgm-cincy-epa opened this issue Dec 16, 2024 · 7 comments
Assignees

Comments

@mgm-cincy-epa
Copy link
Collaborator

In fitted_interval_preds_plot chunk 16 of script 06, I am confused about the data frames being used. In the initial assignment, spatResids_preds is created from the residuals of a5ssn_wswq_reml1 model on 199 obs. The resid_SD is created from spatResids_preds. However, in the third assignment of fitted values the apredict1ws data frame that contains fitted values from the 48 prediction sites is used. Why are the 48 predicted values being mixed up with the resid_SD from the 199 obs? Should the resid_SP originated from the residuals of the 48 predicted sites?

@mgm-cincy-epa
Copy link
Collaborator Author

Relatedly, I have questions about TLH fitted to obs interval plot. First, the three statements below return the exact same residuals.
spatResids<-residuals(z4ssn_wswq_reml1, cross.validation=T)
head(round(aug_z4ssn_wswq_reml1$.fitted))
zspatResids<-residuals(z4ssn_wswq_reml1)

The cross.validation=T argument you specified does not seem to be doing anything. If we need leave-one-out-cross-validated residuals for your fitted interval plot, should we be using the loocv function and specifying cv_predict = T or se.fit = T?

@mgm-cincy-epa
Copy link
Collaborator Author

Also, summary(spat.Pred.median) returns the exact same results as summary(aug_z4ssn_wswq_reml1$.fitted). I guess that is to be expected(?). What we are after is the 5th and 95th percentiles around that fitted value.

@Travis-Neptune
Copy link
Collaborator

Hey Mike,

Thank you for bringing that up,

  1. agreed, the corss.validation=T doesn't need to be there anymore, that was a vestige of the old john/SSN1 that I don't believe was hurting anything but agree should be removed.
  2. Yep I totally missed that copy paste error using the residuals from the 199 obs rather than the 48 preds. I've adjusted the code in chunk 16 to reflect this.
  3. yes, that median could instead just be fitted, and what we particularly want is the upper and lower quantiles. If you'd like that changed for clarity let me know, but I've left it as is for the moment.

The changes to chunk 16 and the drop of cross.validation=T from chunk 11 have been pushed under the interval_preds_upd branch.

@mgm-cincy-epa
Copy link
Collaborator Author

mgm-cincy-epa commented Dec 18, 2024 via email

@Travis-Neptune
Copy link
Collaborator

Hey there,

So I don't see an attached html but I'll try to respond without it. I can look deeper into the CrossValStdErr / loocv stuff, but I do remember in my initial pass moving things from SSN1 to SSN2 that there was a reason I felt like it didn't matter for us. I can look at this deeper, but I wouldn't expect it to change our results much if at all given the relatively stable residuals we see on both our obs and preds points.

But anyways, I'll take a look into it and get back to you either tomorrow or next week about what I find.

@Travis-Neptune
Copy link
Collaborator

Okay so I’ve gone through and added in versions (directly after the existing fitted interval plot sections, both for the obs and then the preds data) That follow the LOOCV method John was utilizing where applicable.
The actual LOOCV situation of finding an estimated residual for an observation based on LOOCV really only applies to the obs portion as it’s effectively creating n versions of the model each with one of the obs missing and then fitting the result to it. 

Whereas when we’re looking at our fitted interval on the preds, the actual fitted values would not change because the model is fixed. This portion really does not affect our figures all that much.

The bigger difference, and one that I’d rather intentionally changed is in regard to the se.fit values.
The newly provided versions use the se.fit values for each individual observation rather than the cumulative standard deviation of the residuals value for the dataset. What this means is that with my method we’re looking at roughly the the interval we’re expecting across the whole model, vs John’s method is looking at each individual point.

The result of this is the smooth error bounds visible in the plots with my method, vs the very squiggly bars on John’s method.

Those both have their uses, but I would argue that mine is more straightforward and useful in terms of model validation. Whereas John’s is more of an introspection on which data points the model would believe we know more or less about.

@mgm-cincy-epa
Copy link
Collaborator Author

mgm-cincy-epa commented Dec 27, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants