Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a robust and scalable architecture for storing and retrieving locations #5

Open
specious opened this issue Dec 21, 2021 · 1 comment
Labels
enhancement New feature or request

Comments

@specious
Copy link
Member

At the current stage, the client requests locations within a radius of a point, and the database (scanning a subset roughly limited to the area being searched) returns a collection comprised of all points within that region.

To a degree, the storage and retrieval system is already optimized for retrieval by spatial proximity, since each location record is stored with a geohash (by using the PostGIS point geometry type when storing the location) alongside its latitude and longitude coordinates, which essentially encodes its proximity to other points.

However, since there is no limit as to the total number of locations that can be marked on the globe, this strategy will need to be replaced with something robust enough to perform well at arbitrary scale.

A proper system for object retrieval will be able to:

  • Receive a request for the set of locations within an area
  • Return back a set of roughly maximum configurable size, with entities representing aggregate clusters of locations where there are too many in a vicinity

Since requests will be made at multiple levels of detail (often including the whole world), the cluster aggregation information needs to be prepared as the data is stored, so it is ready to be served. In other words, no query should have to go through the entire data set.

The solution I propose is to use the H3 geospatial indexing system, whereby the world is implicitly partitioned into hexagonal cells at each level of detail that corresponds to queries at different zoom levels. Essentially this extends the geohashing to indexing the aggregate clusters at the time that locations are added.

This means that making a query at any particular zoom level, it will not be possible to immediately distinguish between points within the same hexagon at that resolution. However, it would be useful to, for example, be able to pick a "representative" point in the cluster, so that it could be advertised as one of the locations within that hexagon.

@specious specious added the enhancement New feature or request label Dec 21, 2021
@specious
Copy link
Member Author

@JasonSanford, @danieltoben introduced you to me as having specific experience working with systems that store geographic data. If you so happen to be inclined, I would absolutely appreciate your opinion on this.

@specious specious changed the title A robust and scalable architecture for storing and retrieving locations Implement a robust and scalable architecture for storing and retrieving locations Jun 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Todo
Development

No branches or pull requests

1 participant