You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Picking up from #119 and #120, in order to further identify datasets the EPA has both to:
Map the amount we've downloaded ("coverage"), and
Download those we don't already have
... we want to use Data Together's sentry to crawl the EPA's Environmental Dataset Gateway data.json JSON-LD. In order to get there, @b5 wants to get sentry to feature parity with the WARC 1.1 spec, which is tracked in datatogether/roadmap#26
TODOs:
Point the Data Together sentry at the JSON-LD urls list
A lot of those links are dead / useless based on some initial spot-checking.
Examine the list of non-404 urls reported post-crawl
Confirm those are in archivers.space or captured by other means
As far as I have been updated this happened. I think larger discussions about the future of this node/where this data lives are on the horizion. I'm going to close this for now, as I imagine those conversations will pick up further along/reframe future work
Picking up from #119 and #120, in order to further identify datasets the EPA has both to:
... we want to use Data Together's sentry to crawl the EPA's Environmental Dataset Gateway data.json JSON-LD. In order to get there, @b5 wants to get sentry to feature parity with the WARC 1.1 spec, which is tracked in datatogether/roadmap#26
TODOs:
A lot of those links are dead / useless based on some initial spot-checking.
Confirm those are in archivers.space or captured by other means
This will resolve #120 in prep for #119
The text was updated successfully, but these errors were encountered: