Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datastore_create can't upload a json records file #217

Open
pauloneves opened this issue Jun 26, 2024 · 3 comments
Open

datastore_create can't upload a json records file #217

pauloneves opened this issue Jun 26, 2024 · 3 comments

Comments

@pauloneves
Copy link

datastore_create() method accepts a records parameter with a list of dictionaries. It will be converted to json by the api..

All types in the dictionary must be jsonable. A datatime value in the dict will issue a validation error.

To fix it I must convert my pandas dataframe to json string, have it loaded back to a python dictionary and then pass it as a parameter to the method that will convert it again to json.

It is very inefficient, specially for large datasets.

I'd like to be able to directly pass a json string to the datastore_create() or datastore_upsert() to have it sent to CKAN

@wardi
Copy link
Contributor

wardi commented Jun 26, 2024

In general datastore_create and datastore_upsert are very slow ways of getting data into the datastore. Consider using a postgres COPY command like xloader and datapusher+ do for efficiently loading large datasets.

Or if you're interested in making it easier to connect pandas with ckanapi and the datastore API for loading data efficiently I would definitely entertain a pull request to make a fast path for loading datastore records.

@pauloneves
Copy link
Author

is "datapusher+" different from "datapusher"

I had a lot of problems with datapusher trying to guess my datatypes and I'm trying to do the work myself creating the datastore via api and uploading data to it so I can control how it is stored.

@wardi
Copy link
Contributor

wardi commented Jun 26, 2024

yes, https://github.com/dathere/datapusher-plus analyzes all the data before setting types so that there's no errors on import

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants