Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleaning discrete gw sw meas data #58

Merged

Conversation

msleckman
Copy link
Collaborator

@msleckman msleckman commented Dec 22, 2022

This PR adds several adjustments to capture all discrete nwis sw and gw data for visualization purposes.

Associated with issue #57 and must be merged after #56

Changes:

renamed sites_along_waterbody.R --> process_nwis_data.R
renamed 4_exports folder --> 4_outputs
new targets:
p2_nwis_meas_gw_data

new functions:
standardized add_stream_order() for adding stream order category to discrete and continuous (dv) sw sites
New more standard join_site_spatial_info() that performs spatial join with target p2_nwis_sites_in_watersheds_sf (i.e. all sites in watersheds) and give info the data records dataset such as lat lon and site_type

discrete data outputs
4_outputs/out/p4_nwis_meas_sw_data_rds
4_outputs/out/p4_nwis_meas_gw_data_rds

numbers: (ref gist L109)

> ## count of individual measurements
> meas_sw_clean %>% count() %>% pull(n)
[1] 48083
> meas_gw_clean %>% count() %>% pull(n)
[1] 64435
> ## count of unique sites
> summarized_meas_gw %>% ungroup() %>%  count() %>% pull(n)
[1] 3322
> summarized_meas_sw %>% ungroup() %>%count() %>% pull(n)
[1] 635

Over 3,000 discrete sites for gw collected a total of 64435 data records, while only 635 discrete sites for sw collecting a total of 48083.

@msleckman msleckman marked this pull request as ready for review December 22, 2022 19:55
@msleckman msleckman requested a review from cnell-usgs December 22, 2022 20:09
@msleckman
Copy link
Collaborator Author

msleckman commented Dec 22, 2022

currently have a merge conflict due to deletion of 4_export.R . This file no longer exists in this branch but exists in upstream main.

  • To resolve

@msleckman
Copy link
Collaborator Author

msleckman commented Dec 23, 2022

currently have a merge conflict due to deletion of 4_export.R . This file no longer exists in this branch but exists in upstream main.

* [x]  To resolve

Managed via a strange work around in this PR #59. Pipeline runs. 4_export.R no longer in repo

Copy link
Member

@cnell-usgs cnell-usgs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think rather than creating long form targets, we need site-level targets that gather site info like location, streamorder, data type, lake, etc. I suggest eitther (1) creating separate targets for each data type at the site-level, or (2) creating a master site-level target that provides this information for all sites, including what type of data are available there. This will be very useful for developing summaries for each site, lake, and making visuals.

join_site_col = 'site_no') %>%
## both dfs have a site_tp_cd col so when joining, two versions are created. Resetti
mutate(site_tp_cd = site_tp_cd.y) %>%
select(!contains(c('.x','.y'))) %>%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're getting these because you're defining the join to one column. It would be better to allow all matching columns be used in the join (e.g. not providing a specific join col), unless there is reason to expect that same-named cols are different between the joined objects and both should exist.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. I'll change such that site_no & site_tp_cd are part of the join.

@cnell-usgs
Copy link
Member

Also do not commit the exported data files to the repo

@msleckman
Copy link
Collaborator Author

msleckman commented Jan 3, 2023

Again, I think rather than creating long form targets, we need site-level targets that gather site info like location, streamorder, data type, lake, etc. I suggest eitther (1) creating separate targets for each data type at the site-level, or (2) creating a master site-level target that provides this information for all sites, including what type of data are available there. This will be very useful for developing summaries for each site, lake, and making visuals.

Do you have a example for this e.g. example of what these site-level targets would look like? Can this tidying task fall into a new PR and can I merge this? I can merge after addressing ⬇️ and the new merge conflict that has come up since the rename commits were pushed to main in #56)
I agree that these more modular summarizing targets would be really helpful downstream, but feel it should be a separate PR.

Also do not commit the exported data files to the repo
Ah, this was a accidental push. My error. I changed items in the .gitignore and this ended up getting pushed.

@cnell-usgs
Copy link
Member

Sure go ahead and merge once the conflicts are addressed. I will work on adding the site-level target.

@cnell-usgs
Copy link
Member

By tidy site-level target I mean a dataframe where there is a single row for each site, and related info in the columns. It would be useful to store important site info, like streamorder

@msleckman
Copy link
Collaborator Author

Following a lengthy process of attempting to resolve merge conflicts that appeared on three files 2_process_site_data.R, 2_process_lake_tribs.R, and 12_process/src/saline_lakes_waterbody.R` which were ultimately removed in this PR to resolve #66

This PR still did not make the conflict in 2_process_lake_tribs.R go away. As a result I have deleted it in the branch (renamed it to a file that is not being tracked by this PR) in this commit 0203c13.

Will add via a new PR #67

@msleckman msleckman merged commit abd9e5f into USGS-VIZLAB:main Jan 3, 2023
msleckman added a commit to msleckman/saline-lakes that referenced this pull request Jan 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants