Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indexing new features without crawling everything all over again #8

Open
ksonda opened this issue Aug 22, 2022 · 0 comments
Open

indexing new features without crawling everything all over again #8

ksonda opened this issue Aug 22, 2022 · 0 comments
Assignees

Comments

@ksonda
Copy link
Member

ksonda commented Aug 22, 2022

In the medium term, we anticipate a scenario where organizations are adding new tranches of PIDs on a somewhat frequent, if irregular basis. (e.g. 1000 PIDS week1, 0 pids week2, 100 PIDS week3, etc.)

We need to come up with a way for gleaner to only need to crawl new/updated sitemap portions. Things to do:

  1. Gleaner should be able to target sitemap entries according to lastmod >= yesterday or some such thing that triggers frequent checking. @fils
  2. sitemap.xml should keep track of lastmod of the constituent sitemap links in a way that accurately reflects the last time PIDs were submitted to github.com/internetofwater/geoconnex.us @webb-ben
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants