fitted_interval_preds_plot #7

mgm-cincy-epa · 2024-12-16T18:50:22Z

In fitted_interval_preds_plot chunk 16 of script 06, I am confused about the data frames being used. In the initial assignment, spatResids_preds is created from the residuals of a5ssn_wswq_reml1 model on 199 obs. The resid_SD is created from spatResids_preds. However, in the third assignment of fitted values the apredict1ws data frame that contains fitted values from the 48 prediction sites is used. Why are the 48 predicted values being mixed up with the resid_SD from the 199 obs? Should the resid_SP originated from the residuals of the 48 predicted sites?

mgm-cincy-epa · 2024-12-18T14:29:59Z

Relatedly, I have questions about TLH fitted to obs interval plot. First, the three statements below return the exact same residuals.
spatResids<-residuals(z4ssn_wswq_reml1, cross.validation=T)
head(round(aug_z4ssn_wswq_reml1$.fitted))
zspatResids<-residuals(z4ssn_wswq_reml1)

The cross.validation=T argument you specified does not seem to be doing anything. If we need leave-one-out-cross-validated residuals for your fitted interval plot, should we be using the loocv function and specifying cv_predict = T or se.fit = T?

mgm-cincy-epa · 2024-12-18T15:01:03Z

Also, summary(spat.Pred.median) returns the exact same results as summary(aug_z4ssn_wswq_reml1$.fitted). I guess that is to be expected(?). What we are after is the 5th and 95th percentiles around that fitted value.

Travis-Neptune · 2024-12-18T17:04:07Z

Hey Mike,

Thank you for bringing that up,

agreed, the corss.validation=T doesn't need to be there anymore, that was a vestige of the old john/SSN1 that I don't believe was hurting anything but agree should be removed.
Yep I totally missed that copy paste error using the residuals from the 199 obs rather than the 48 preds. I've adjusted the code in chunk 16 to reflect this.
yes, that median could instead just be fitted, and what we particularly want is the upper and lower quantiles. If you'd like that changed for clarity let me know, but I've left it as is for the moment.

The changes to chunk 16 and the drop of cross.validation=T from chunk 11 have been pushed under the interval_preds_upd branch.

mgm-cincy-epa · 2024-12-18T22:30:00Z

Hi Travis, We need to talk about these points some more, especially point 1. Look at the attached html and do control F on cross.validation. That should return 12 hits. When you get to the 12th cross.validation = TRUE read the comments in the code chunk 4.3.2.0.0.2 CodeChunk 53: spatPredictiveDistr. The _CrossValStdErr_ is what has me thinking we need to pull that term out the loocv function in SSN2. _CrossValStdErr_ occurs 9 times in the html when cross.validation = T is used. Can you please crosswalk between SSN and SSN2 on how to get _CorssValStdErr_ and if we need it? Thanks, Mike From: Travis-Neptune ***@***.***> Sent: Wednesday, December 18, 2024 11:05 AM To: USEPA/vdeq-stream-condition-predictions ***@***.***> Cc: McManus, Michael ***@***.***>; Author ***@***.***> Subject: Re: [USEPA/vdeq-stream-condition-predictions] fitted_interval_preds_plot (Issue #7) Caution: This email originated from outside EPA, please exercise additional caution when deciding whether to open attachments or click on provided links. Hey Mike, Thank you for bringing that up, 1. agreed, the corss.validation=T doesn't need to be there anymore, that was a vestige of the old john/SSN1 that I don't believe was hurting anything but agree should be removed. 2. Yep I totally missed that copy paste error using the residuals from the 199 obs rather than the 48 preds. I've adjusted the code in chunk 16 to reflect this. 3. yes, that median could instead just be fitted, and what we particularly want is the upper and lower quantiles. If you'd like that changed for clarity let me know, but I've left it as is for the moment. The changes to chunk 16 and the drop of cross.validation=T from chunk 11 have been pushed under the interval_preds_upd branch. - Reply to this email directly, view it on GitHub<#7 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A4YDWZ56IT2P7AIJHQTEXKL2GGTJ5AVCNFSM6AAAAABTWY5DEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJRHA2DQNRXG4>. You are receiving this because you authored the thread.Message ID: ***@***.***>

Travis-Neptune · 2024-12-19T18:37:21Z

Hey there,

So I don't see an attached html but I'll try to respond without it. I can look deeper into the CrossValStdErr / loocv stuff, but I do remember in my initial pass moving things from SSN1 to SSN2 that there was a reason I felt like it didn't matter for us. I can look at this deeper, but I wouldn't expect it to change our results much if at all given the relatively stable residuals we see on both our obs and preds points.

But anyways, I'll take a look into it and get back to you either tomorrow or next week about what I find.

Travis-Neptune · 2024-12-24T19:12:38Z

Okay so I’ve gone through and added in versions (directly after the existing fitted interval plot sections, both for the obs and then the preds data) That follow the LOOCV method John was utilizing where applicable.
The actual LOOCV situation of finding an estimated residual for an observation based on LOOCV really only applies to the obs portion as it’s effectively creating n versions of the model each with one of the obs missing and then fitting the result to it.  
Whereas when we’re looking at our fitted interval on the preds, the actual fitted values would not change because the model is fixed. This portion really does not affect our figures all that much.

The bigger difference, and one that I’d rather intentionally changed is in regard to the se.fit values.
The newly provided versions use the se.fit values for each individual observation rather than the cumulative standard deviation of the residuals value for the dataset. What this means is that with my method we’re looking at roughly the the interval we’re expecting across the whole model, vs John’s method is looking at each individual point.

The result of this is the smooth error bounds visible in the plots with my method, vs the very squiggly bars on John’s method.

Those both have their uses, but I would argue that mine is more straightforward and useful in terms of model validation. Whereas John’s is more of an introspection on which data points the model would believe we know more or less about.

mgm-cincy-epa · 2024-12-27T15:49:50Z

Hi Travis, Thanks for reply and the code about the fitted interval preds plot. I successfully merged your branch, interval_loocv_options, to the main and then pulled it down to my local repository. For now, I left your branch up there. Here are changes I made to script 06. After you have read about the changes, would you please reply by email to my questions below, and please let me know when you would be available to discuss script 06? Thanks, Mike 1. I applied TLH fitted to obs interval plot to chunk 11 (fitted_interval_obs_plot) and TLH fitted to obs interval plot (LOOCV method) to chunk 12 (fitted_interval_plot_loocv). I got the same plots you did and have commented those chunks out as the z4 model in those chunks is not the model I am focused on. 2. I added a chunk 13, loocv_pts_outside, which uses code from John Carson that identifies the point outside the prediction interval. That chunk is also commented out. 3. The model of interest is in chunk 15 under Distance Hypothesis, a5ssn_wswq_ml1, so please direct any future coding for that model. My shorthand reference for that model is the a5 model. 4. Chunks 17 and 18 applies your code TLH fitted to obs interval plot for a5 model and TLH fitted to obs interval plot (LOOCV method), respectively, to the 199 obs. Chunk 19 identifies obs that fall outside the loocv 90% prediction limits. 5. Chunk 20, distance hypothesis preds, augments the a5ssn_wswq_reml1 model with predictions. Please note the arguments used with augment. 6. Chunk 21, ribbon pred interval, is a new addition as I used the output from the augmented predictions with geom_ribbon to make a prediction interval plot. Using the SSN2 augmented predictions with geom_ribbon produces a squiggly prediction interval similar to John Carson's and your code. Here are my questions about this code: 6a) Is this code producing a 95% or 90% prediction interval? Level = 0.95 was set in the augment function so I think it is a 95% prediction interval. 6b) Is the spatial prediction percentiles using qnorm from JC and TLH code more accurate? The ribbon prediction interval identifies only 1 point outside the interval whereas the spatial prediction percentiles identifies 5 (see chunk 24). I need you to explain what qnorm is doing. 7. Chunks 22 and 23 apply the straight line and squiggly line predictions intervals. From: Travis-Neptune ***@***.***> Sent: Tuesday, December 24, 2024 1:13 PM To: USEPA/vdeq-stream-condition-predictions ***@***.***> Cc: McManus, Michael ***@***.***>; Author ***@***.***> Subject: Re: [USEPA/vdeq-stream-condition-predictions] fitted_interval_preds_plot (Issue #7) Caution: This email originated from outside EPA, please exercise additional caution when deciding whether to open attachments or click on provided links. Okay so I’ve gone through and added in versions (directly after the existing fitted interval plot sections, both for the obs and then the preds data) That follow the LOOCV method John was utilizing where applicable. The actual LOOCV situation of finding an estimated residual for an observation based on LOOCV really only applies to the obs portion as it’s effectively creating n versions of the model each with one of the obs missing and then fitting the result to it.   Whereas when we’re looking at our fitted interval on the preds, the actual fitted values would not change because the model is fixed. This portion really does not affect our figures all that much. The bigger difference, and one that I’d rather intentionally changed is in regard to the se.fit values. The newly provided versions use the se.fit values for each individual observation rather than the cumulative standard deviation of the residuals value for the dataset. What this means is that with my method we’re looking at roughly the the interval we’re expecting across the whole model, vs John’s method is looking at each individual point. The result of this is the smooth error bounds visible in the plots with my method, vs the very squiggly bars on John’s method. Those both have their uses, but I would argue that mine is more straightforward and useful in terms of model validation. Whereas John’s is more of an introspection on which data points the model would believe we know more or less about. — Reply to this email directly, view it on GitHub<#7 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/A4YDWZ754IYVV6NFB2Q2YCD2HGW33AVCNFSM6AAAAABTWY5DEWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNRRGM2TOMBTGM>. You are receiving this because you authored the thread.Message ID: ***@***.******@***.***>>

mgm-cincy-epa assigned Travis-Neptune Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fitted_interval_preds_plot #7

fitted_interval_preds_plot #7

mgm-cincy-epa commented Dec 16, 2024

mgm-cincy-epa commented Dec 18, 2024

mgm-cincy-epa commented Dec 18, 2024

Travis-Neptune commented Dec 18, 2024

mgm-cincy-epa commented Dec 18, 2024 via email

Travis-Neptune commented Dec 19, 2024

Travis-Neptune commented Dec 24, 2024

mgm-cincy-epa commented Dec 27, 2024 via email

fitted_interval_preds_plot #7

fitted_interval_preds_plot #7

Comments

mgm-cincy-epa commented Dec 16, 2024

mgm-cincy-epa commented Dec 18, 2024

mgm-cincy-epa commented Dec 18, 2024

Travis-Neptune commented Dec 18, 2024

mgm-cincy-epa commented Dec 18, 2024 via email

Travis-Neptune commented Dec 19, 2024

Travis-Neptune commented Dec 24, 2024

mgm-cincy-epa commented Dec 27, 2024 via email