Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UK Postcode support to internal geocoder using open data from ONS #498

Open
3 tasks
stevelewis99 opened this issue Jun 8, 2018 · 3 comments
Open
3 tasks

Comments

@stevelewis99
Copy link

Context

Currently the internal geocoder in CARTO doesn't really support the geocding of UK postcodes very well, success rate and accuracy is poor - but only due to missing data. Reasons to address:

  • UK Users not already knowledgable about CARTO find it confusing that when you geocode using the 'postal codes' option, it doesn't really work very well
  • Customers who have used other GIS tools (e.g. mapinfo) and comparing how we handle the data to those other tools, it looks like we just aren't very good/mature .. when in fact this is not true.
  • Solutions/Support always advise to forget the 'postal codes' option as this uses internal geocoder, and instead use the 'street addresses' option with just the postcode.
    -- This uses LDS credits, doesn't cache them, and often people have more rows than LDS credits.. they end up using online batch geocoders/google API to get the raw data sorted out before bringing it into CARTO (so this outcome is like a fail, in my opinion)
    -- Already, before we've even got into map creation and analysis, they are having user experience issues .. ones which we could solve with adding relevant data to our internal geocoder 👍
    -- The datasets I propose we use to solve this are very likely the same data used by some of the geocoding services we rely on to give us the right answers (in return for payment) .. so it could save us money by doing it ourselves.

Manual investigation

The Office for National Statistics releases a CSV: National Statistics Postcode Lookup (Latest) Centroids on a regular basis (I think 6 months)

This lookup table allows matching of a full UK postcode to lat/long (as well as a huge amount of other boundary systems.. if we wanted to leverage that later too, maybe in the DO?... but for now I'm just focussed on mapping postcode strings to their centroids).

Here's a screenshot of the columns in the file:

image

To test this dataset could work, I did the following:

  1. Reduced the column set down, to only include the postcode, latitude, longitude, positional quality

  2. Removed all whitespace from the postcode, renamed to postcode_nospace

  3. Uploaded to CARTO (where the import guessing converted the latitude and longitude into points)
    image

  4. Wrote a db function to take a postcode string from a source dataset to be geocoded, remove the white space from the string, select the_geom from the lookup dataset created in [3]

  5. Used sql update statement on the source dataset to write into the_geom the result from the function.

UPDATE "steve-carto".postcode_samples
set the_geom = util_geocode_uk_postcode_point(postcode, 'postcode_lookup_uk')

image

image

It seems to work perfectly. I have to go to a customer (GVA) and train them on how to do this manual process, give them the function etc. as a short term fix for them.

But it's pretty much doing the same job as the internal geocoder, from what I understand - so it would be AWESOME if we could just massage this lookup data into our existing mechanism.

It could save us money on third party services, as well as make us look way better, and most importantly customer experience would improve too when using UK postcodes (common scenario).

If you have any questions or want more info please reach out to me!

cc @hannahblue

p.s. in future this data could also support reverse geocoding .. given a point, find the nearest postcode.

To do (I'll let you rewrite these I'm just inventing steps here)

  • Hannah to agree/confirm (pretty please?)
  • Determine the best way to ingest, modify, synchronize the dataset
  • Test and enjoy our vastly improved in-house geocoding capability
@stevelewis99
Copy link
Author

Also, if we really don't want all the extra column mappings from this 'gridlink' product (I think it would be useful though if we can?) then this is another source for purely geocoding from postcodes and nothing else.
https://www.ordnancesurvey.co.uk/business-and-government/products/code-point-open.html

HERE told us it's exactly what they use.

But the data in the national postcode lookup table comes from ordnance survey, so it's all the same data.

User guide here too with all the details, if needed. https://data.gov.uk/dataset/e7308379-35af-46e6-a570-8825fec1e008/national-statistics-postcode-lookup-may-2018-user-guide

@stevelewis99
Copy link
Author

@hannahblue any ideas if/when we could prioritise?

@hannahblue
Copy link

We're putting in place 3 days every 6 weeks for the Data Team to update and add data sets in the DO. We have some other things in the queue like updating the Census for Canada, US, and Australia as well as requests for adding Japan data.

We also have a number of geocoding improvements on the roadmap to do things like return metadata, and let users store the results of geocodes among other things.

We'll need to figure out what the level of effort is for implementing this, where it fits in the priority, and whether this can be added during one of the 3 data update days or needs to be part of the geocoding improvement project. I definitely understand the value, and want to get this in as soon as it makes sense. Thanks for raising the issue.

cc @juanignaciosl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants