-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
catchment weights #28
Comments
@JordanLaserGit I can't say for certain whether this would fit into I haven't had the chance to generate forcing files, only use them. Would you be able to give me an example of the weights files? |
https://ngenresourcesdev.s3.us-east-2.amazonaws.com/01_weights.json @program-- Here's a weight file I generated a few months ago, along with the geopackage it came from. v20 hydrofabric I believe. This isn't an immediate need, just figured these weight files should be standardized and live somewhere the community can access readily. If you're curious to see the implementation in the forcing generation, you can find that here. https://github.com/CIROH-UA/ngen-datastream/tree/main/forcingprocessor |
@JordanLaserGit Is this file generating the weights? If so, I think it's possible to integrate something similar on the service-side (or maybe client-side? since that'll let the user pick what forcing to use?). For now, let's keep this issue open in case of further discussion, and I'll note the requirement as an addition to the hfsubset API service. 😄 |
@program-- Yup that'll create the weights. Assuming we do something like run that script once on conus, and then subset that file, then the speed of that script probably isn't too big of an issue. Though if we want that script to run upon user request, we may want to optimize it. CONUS took something like 18 hours (with no parallelization whatsoever). Sounds good, thanks for taking a look at this! |
Ask and ye shall receive! @JoshCu just sped things up. CIROH-UA/ngen-datastream#36 |
In theory a further 2.5x speedup is possible using geometry_windows to raster only a subset of the whole region. But I'm having some interesting issues with transforms. depending on if I open the source netcdf with rasterio or with xarray the y axis is flipped and there's some rounding error by the looks of it? Entirely possible I'm doing something weird though, I don't have much experience with this. code to generate incorrect weightings quickly rasterio using netcdf4 (i think)
xarray definitely using netcdf4
current weights"cat-18": [[2421, 2422, 2422, 2423, 2423, 2424, 2424, 2424, 2424, 2424, 2425, 2425, 2425, 2425, 2425, 2426, 2426, 2426, 2426, 2426, 2426, 2427, 2427, 2427, 2427, 2427, 2427, 2427, 2427, 2428, 2428, 2428, 2428, 2428, 2428, 2428, 2428, 2428, 2429, 2429, 2429, 2429, 2429, 2429, 2430, 2430, 2431, 2431], [4323, 4323, 4324, 4323, 4324, 4322, 4323, 4324, 4325, 4326, 4322, 4323, 4324, 4325, 4326, 4322, 4323, 4324, 4327, 4328, 4329, 4321, 4322, 4323, 4324, 4325, 4326, 4327, 4328, 4321, 4322, 4323, 4324, 4325, 4326, 4327, 4328, 4329, 4324, 4325, 4326, 4327, 4328, 4329, 4327, 4328, 4327, 4328]]} weights with rasterio transform (y axis flip has been corrected for)"cat-18": [[2432, 2433, 2433, 2434, 2434, 2435, 2435, 2435, 2435, 2435, 2436, 2436, 2436, 2436, 2436, 2437, 2437, 2437, 2437, 2437, 2437, 2438, 2438, 2438, 2438, 2438, 2438, 2438, 2438, 2439, 2439, 2439, 2439, 2439, 2439, 2439, 2439, 2439, 2440, 2440, 2440, 2440, 2440, 2440, 2441, 2441, 2442, 2442], [4323, 4323, 4324, 4323, 4324, 4322, 4323, 4324, 4325, 4326, 4322, 4323, 4324, 4325, 4326, 4322, 4323, 4324, 4327, 4328, 4329, 4321, 4322, 4323, 4324, 4325, 4326, 4327, 4328, 4321, 4322, 4323, 4324, 4325, 4326, 4327, 4328, 4329, 4324, 4325, 4326, 4327, 4328, 4329, 4327, 4328, 4327, 4328]]} The column indices are all shifted by 11 but otherwise the result is the same If those rasterio weightings are correct then it only takes ~10 minutes to create conus weightings |
@JoshCu for my own purposes, I rewrote the weights generator in R
R version#!/usr/bin/env Rscript
gpkg_path <- "..."
grid_path <- "..."
output <- "..."
# microbenchmark::microbenchmark({
grd <- terra::rast(paste0(
"netcdf:",
grid_path,
":RAINRATE"
))
gpkg <-
terra::vect(gpkg_path, "divides") |>
terra::project(terra::crs(grd))
weight_rast <-
terra::rasterize(
x = gpkg,
y = grd,
field = "divide_id",
background = NA_character_,
touches = TRUE
) |>
terra::flip(direction = "vertical")
weights <-
terra::as.data.frame(weight_rast, cells = TRUE) |>
dplyr::mutate(
row = terra::rowFromCell(weight_rast, cell),
col = terra::colFromCell(weight_rast, cell)
) |>
dplyr::select(-cell) |>
dplyr::arrange(divide_id, row, col)
split(weights, weights$divide_id) |>
lapply(FUN = function(part) {
setNames(as.list(part[, c("row", "col")]), c())
}) |>
jsonlite::write_json(output)
# }, times = 5L) Which seems to have a decent speed up if I'm interpreting your results correctly.
for VPU 01. However, I'm experiencing something similar to your issue, where my coordinates are (seemingly) slightly off. I've attached my example weights output from R here: weights-r.json |
@program-- I'm just running bash VPU 01 on 56 Cores ~12.5sI had no idea you have folding sections in md, I'll be overusing them from now onStarting at 2024-01-11 22:12:28.170569
Time spent opening files 0:00:00.344885
Total processing time: 0:00:08.741575
Total writing time: 0:00:00.318865
Total time: 0:00:11.566039
real 0m12.570s
user 6m55.572s
sys 0m12.635s We do seem to have different weightings outputs, I'm not sure the best way to diff two large json files is but they're different sizes and grep spot checks show different results. xarray transform"cat-18": [[2421, 2422, 2422, 2423, 2423, 2424, 2424, 2424, 2424, 2424, 2425, 2425, 2425, 2425, 2425, 2426, 2426, 2426, 2426, 2426, 2426, 2427, 2427, 2427, 2427, 2427, 2427, 2427, 2427, 2428, 2428, 2428, 2428, 2428, 2428, 2428, 2428, 2428, 2429, 2429, 2429, 2429, 2429, 2429, 2430, 2430, 2431, 2431], [4323, 4323, 4324, 4323, 4324, 4322, 4323, 4324, 4325, 4326, 4322, 4323, 4324, 4325, 4326, 4322, 4323, 4324, 4327, 4328, 4329, 4321, 4322, 4323, 4324, 4325, 4326, 4327, 4328, 4321, 4322, 4323, 4324, 4325, 4326, 4327, 4328, 4329, 4324, 4325, 4326, 4327, 4328, 4329, 4327, 4328, 4327, 4328]] rasterio transform"cat-18": [[2432, 2433, 2433, 2434, 2434, 2435, 2435, 2435, 2435, 2435, 2436, 2436, 2436, 2436, 2436, 2437, 2437, 2437, 2437, 2437, 2437, 2438, 2438, 2438, 2438, 2438, 2438, 2438, 2438, 2439, 2439, 2439, 2439, 2439, 2439, 2439, 2439, 2439, 2440, 2440, 2440, 2440, 2440, 2440, 2441, 2441, 2442, 2442], [4323, 4323, 4324, 4323, 4324, 4322, 4323, 4324, 4325, 4326, 4322, 4323, 4324, 4325, 4326, 4322, 4323, 4324, 4327, 4328, 4329, 4321, 4322, 4323, 4324, 4325, 4326, 4327, 4328, 4321, 4322, 4323, 4324, 4325, 4326, 4327, 4328, 4329, 4324, 4325, 4326, 4327, 4328, 4329, 4327, 4328, 4327, 4328]] terra transform"cat-18":[[2424,2425,2425,2425,2426,2426,2426,2426,2426,2427,2427,2427,2427,2427,2428,2428,2428,2428,2428,2428,2428,2429,2429,2429,2429,2429,2430,2430,2430,2431,2432],[4324,4324,4325,4326,4323,4324,4325,4326,4327,4324,4325,4328,4329,4330,4323,4324,4325,4326,4327,4328,4329,4326,4327,4328,4329,4330,4328,4329,4330,4329,4329]] It would be easier to see if I could figure out how to wrap lines in code blocks, I seem to have double the number of coordinates for the catchments RTI have a weightings generator we could compare to, but I think I'll leave that to someone who's understanding extends beyond my insightful "the numbers are different" comments. |
@JordanLaserGit @JoshCu https://github.com/isciences/exactextract just recently got python bindings integrated (officially)... Check it out for weight generation, it's super fast and accurate. (And likely how the R code mentioned above is so quick 😉) |
Yeah VPU 01 is that link you sent (err, I might have used v20.1, but they should be effectively the same for this...). The size difference is most likely from the formatting between the JSON files, minifying has a drastic effect sometimes, and It's interesting to see that all three generators have different results 😅 maybe @mikejohnson51 can weigh in (haha) to verify if my R implementation is correct since he's better with gridded data than I am. (! if you have time Mike)
How dare you, I did it by hand 😛 but that would be useful, I didn't know you could generate weights with exactextract actually haha, so I'll have to check that out. I've only used it a bit. |
Hi all! Yes we use to deliever the catchment weights for a few forcing products with each release as a The reason it is more accurate is that it takes into account partial coverages (e.g. a polygon covers 1/2 a cell). We stopped sharing those as the forcing workstream became more mature and all groups (forcing, model engine, ect) wanted to do it themselves. Overall, I am not opposed to reintroducing the weights that would be accessiable with |
The short term goal is to get daily ngen outputs via ngen-datastream. In this, a forcings engine takes in the weights as an input. I've already generated the weights for v2.1 and we will re-use the same file until the next hydrofabric version comes out. So having a conus weights file that could be subsetted, would be the idea dataset. Not an pressing need though, as ngen-datastream is still under development. The primary motivation here is standardizing the catchment weights. We can all agree on using the same tool (like |
Makes perfect sense to me. Remember though (and I'm sure you got this) a weights file will only be valid for one grid (same grid size and crs). We can add this to the releases. Would you mind adding an issue to the Hydrofabric repo to track this with a pointer to a sample forcing file? |
@hellkite500 Exact extract looks great! I didn't realise quite how low resolution the netcdf's are compared to the water basin geometries until I overlaid them just now. Do you know if there's a recommended method to parallelise it? I think it's caching the % of each pixel covered by the geometries so I've been chunking it by geometry rather than by time to prevent recalculating the polygons. I'm just using python's multiprocessing pool as it's what I'm familiar with, but probably not the best way of doing it? |
See here for a conus weights parquet file: s3://lynker-spatial/v20.1/forcing_weights.parquet |
In order to create nextgen forcing files, weights (indices) are calculated from the hydrofabric and national water model files. Currently, this is done by rasterizing each catchment, which is a time consuming process. Would subsetting from a conus_weight.json file be a better solution? Would hfsubset be a good place to do this? These weights might change with each hydrofabric release, so it may be nice to have the weights computing automatically along with the geopackages.
The text was updated successfully, but these errors were encountered: