Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

review code used to pull NWIS data #97

Open
jordansread opened this issue Mar 13, 2017 · 7 comments
Open

review code used to pull NWIS data #97

jordansread opened this issue Mar 13, 2017 · 7 comments
Assignees

Comments

@jordansread
Copy link

jordansread commented Mar 13, 2017

https://github.com/USGS-VIZLAB/gages-through-ages/blob/master/scripts/fetch/getData_siteRecords.R

One thing I am noticing is that we have some duplicated rows:

library(dplyr) # using data.in from process.disch_sites
filter(data.in$`disch-data`) %>% distinct() %>% nrow
[1] 582882
filter(data.in$`disch-data`) %>% nrow
[1] 583902

here is one site in particular:

filter(data.in$`disch-data`, site_no == '02053200') %>% .$year %>% length()
[1] 116
filter(data.in$`disch-data`, site_no == '02053200') %>% .$year %>% unique() %>% length()
[1] 58
@jordansread
Copy link
Author

Looks like nDays is creating additional non-unique rows:

filter(data.in$`disch-data`, site_no == '02053200') %>% select(-nDays)  %>% distinct() %>% nrow
[1] 58
filter(data.in$`disch-data`, site_no == '02053200') %>% distinct() %>% nrow
[1] 116

@jordansread
Copy link
Author

On top of the other distinct I had above, if we only keep site_no and year, we have 17,737 duplicated rows

select(site_no, year) %>% distinct %>% nrow

@jordansread
Copy link
Author

jordansread commented Mar 13, 2017

These sites seem to be missing too:
https://waterdata.usgs.gov/pr/nwis/inventory/?site_no=50063440
https://waterdata.usgs.gov/hi/nwis/uv?site_no=16229000
https://waterdata.usgs.gov/nwis/inventory/?site_no=02430615

this one is in HCN (hydro-clim), but limited data:
https://waterdata.usgs.gov/nwis/inventory/?site_no=04127918

And these are only the sites that I am cross-ref'ing w/ hydroclim, so there are likely more if we can figure out why these didn't end up (maybe with the exception of the last one).

jordansread pushed a commit to jordansread/gages-through-ages that referenced this issue Mar 13, 2017
@ldecicco-USGS
Copy link
Contributor

So, not sure why they wouldn't show up in the site file, but that's why their not showing up in our data. I'll keep poking around

@ldecicco-USGS
Copy link
Contributor

Pulling internal data instead of external, gets some of these:

> "50063440" %in% x$site_no
[1] TRUE
> "16229000" %in% x$site_no
[1] TRUE
> "02430615" %in% x$site_no
[1] FALSE
> "04127918" %in% x$site_no
[1] TRUE

@ldecicco-USGS
Copy link
Contributor

And when you go to the NWIS sites that doesn't show up with the internal data set to TRUE, you see this:

image

So...I'm not surprised to not get that data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants