Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Data Together Sentry to crawl EPA data.json #199

Closed
2 tasks done
dcwalk opened this issue Aug 17, 2017 · 2 comments
Closed
2 tasks done

Use Data Together Sentry to crawl EPA data.json #199

dcwalk opened this issue Aug 17, 2017 · 2 comments

Comments

@dcwalk
Copy link
Contributor

dcwalk commented Aug 17, 2017

Picking up from #119 and #120, in order to further identify datasets the EPA has both to:

  1. Map the amount we've downloaded ("coverage"), and
  2. Download those we don't already have

... we want to use Data Together's sentry to crawl the EPA's Environmental Dataset Gateway data.json JSON-LD. In order to get there, @b5 wants to get sentry to feature parity with the WARC 1.1 spec, which is tracked in datatogether/roadmap#26

TODOs:

  • Point the Data Together sentry at the JSON-LD urls list
    A lot of those links are dead / useless based on some initial spot-checking.
  • Examine the list of non-404 urls reported post-crawl
    Confirm those are in archivers.space or captured by other means

This will resolve #120 in prep for #119

@dcwalk
Copy link
Contributor Author

dcwalk commented Sep 12, 2017

During our September 11 Archiving call, this was indicated as an ongoing and important priority. Moving to the Fall Work Cycle milestone

@dcwalk
Copy link
Contributor Author

dcwalk commented Sep 1, 2018

As far as I have been updated this happened. I think larger discussions about the future of this node/where this data lives are on the horizion. I'm going to close this for now, as I imagine those conversations will pick up further along/reframe future work

@dcwalk dcwalk closed this as completed Sep 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants