Downloadable archive of Trefle database #81

SebastianKG · 2020-12-22T21:38:53Z

Is your feature request related to a problem? Please describe.
This may be a stretch as a "feature", but I'm still looking for a way to get at the underlying dataset as a whole, without being rate-limited. I have a long-running crawler for the API and I have slowly collected a lot of it, but it remains incomplete and probably always will (when crawling lawfully with only one API token, the limit is quite strict). Back in August, we discussed a data dump (here: #44), and the following was said:

We will soon provide an archive of our database for you to download, and thus avoid iterating on all the plants.

Describe the solution you'd like
I'm sure the project is strapped for developer time and this may not be a priority, but I would love to build and publicize some cool Apache-Spark-aggregated high-level uses for this data. To enable projects like this, a data dump (or a much more lenient page size limit, which would be more expensive for the project, I expect) seems necessary.

itsezc · 2021-01-10T15:49:48Z

Not the most up to date, but this maybe of help: https://github.com/treflehq/dump

SebastianKG added the enhancement New feature or request label Dec 22, 2020

SebastianKG assigned lambda2 Dec 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Downloadable archive of Trefle database #81

Downloadable archive of Trefle database #81

SebastianKG commented Dec 22, 2020

itsezc commented Jan 10, 2021

Downloadable archive of Trefle database #81

Downloadable archive of Trefle database #81

Comments

SebastianKG commented Dec 22, 2020

itsezc commented Jan 10, 2021