-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
emergency services deduplication code #13
base: flooding
Are you sure you want to change the base?
Conversation
aileenmcd
commented
Apr 7, 2022
- Adding a function to utils.R to follow the multipolygon, polygon, points method for OSM data.
- Additional steps taken to reduce duplications further on line 119 in cases where looks to be multiple polygons relating to same building. Assumption made that unlikely would be a building for the same service within the same OA here.
- When a polygon or multipolygon spans over multiple OAs take the OA which has largest intersection area here to allow output to be at LSOA level.
- Re-ran the build index script and performed checks on the effect the new service code had on the 'Community Support' domain deciles and the flooding vulnerability output deciles here.
- Change to the social renters data as think this had been overwritten with the Wales data since the last index built for England so effected the index build results when comparing to previous result.
…ta which has been overwritten with welsh data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left you some feedback to review. Interested on your thoughts 👍
points_not_polygon_multipolygon_overlap <- points |> | ||
st_join(polys_multipolys) |> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could some defensive programming for this join be introduced to help potential debugging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in f030d5b.
# Check if error on joins | ||
tryCatch( | ||
{ | ||
polygons |> | ||
st_join(multipolygons) | ||
}, | ||
error = function(e) { | ||
message("There is a joining error, you may need to turn off s2 processing using sf::sf_use_s2(FALSE)") | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job on the defensive programming style 😃
services_eng_dups <- services_eng |> | ||
group_by(OA11CD, service) |> | ||
mutate(count_id = n()) |> | ||
filter(count_id > 1) |> | ||
arrange(desc(count_id), OA11CD, name) |> | ||
st_transform(crs = 4326) | ||
|
||
# Take top one where has name (if not null for all) and then the largest size | ||
services_eng_dedup <- services_eng_dups |> | ||
mutate(size = st_area(geometry)) |> | ||
group_by(OA11CD, service) |> | ||
arrange(OA11CD, name, desc(size)) |> | ||
slice(1) | ||
|
||
services_eng_dedup <- services_eng |> | ||
filter(!osm_id %in% services_eng_dups$osm_id) |> | ||
bind_rows(services_eng_dedup) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this method should also be bundled into a separate R function?
Are we safe in the assumption that it is unlikely that the same building will be used for the same service in the same OA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intuition is that this assumption is sound, and is probably something we want to generically apply to all of the outputs from the osm_data_reduce()
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah suppose difficult to validate the assumption and there may be exceptions where is more than 1 of a service in a single OA but thinking was since there are about 171k OAs in England and if take fire (which has more than police or ambulance) there are 1,400 stations of these so if well distributed in theory unlikely to be crossover (and OAs are about ~125 households and have a population of ~300). Suppose balance of this assumption, which may have some exceptions, with the duplicated cases within the OSM data which would need manually checking perhaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have put this deduplication part into a separate function osm_oa_deduping()
in case want to use logic inosm_data_reduce()
and osm_oa_deduping()
separately. Has been updated in 558a30a.
@aileenmcd is this ready for merging? |
|
Great. @matthewgthomas I've done an initial review (see my comments above). Please can you do a final review and check you are happy with the code/logic? |