Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downloadable archive of Trefle database #81

Open
SebastianKG opened this issue Dec 22, 2020 · 1 comment
Open

Downloadable archive of Trefle database #81

SebastianKG opened this issue Dec 22, 2020 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@SebastianKG
Copy link

Is your feature request related to a problem? Please describe.
This may be a stretch as a "feature", but I'm still looking for a way to get at the underlying dataset as a whole, without being rate-limited. I have a long-running crawler for the API and I have slowly collected a lot of it, but it remains incomplete and probably always will (when crawling lawfully with only one API token, the limit is quite strict). Back in August, we discussed a data dump (here: #44), and the following was said:

We will soon provide an archive of our database for you to download, and thus avoid iterating on all the plants.

Describe the solution you'd like
I'm sure the project is strapped for developer time and this may not be a priority, but I would love to build and publicize some cool Apache-Spark-aggregated high-level uses for this data. To enable projects like this, a data dump (or a much more lenient page size limit, which would be more expensive for the project, I expect) seems necessary.

@SebastianKG SebastianKG added the enhancement New feature or request label Dec 22, 2020
@itsezc
Copy link

itsezc commented Jan 10, 2021

Not the most up to date, but this maybe of help: https://github.com/treflehq/dump

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants