Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use resource schema to set schema fields in data resource that is passed to Factory (and error if absent) #45

Closed
4 tasks
mbeilin opened this issue Jul 6, 2020 · 4 comments
Assignees

Comments

@mbeilin
Copy link
Contributor

mbeilin commented Jul 6, 2020

When user uploads a file with tabular data to portal (CKAN), we want to guess columns names properly, so it would be passed as part of payload to a DAG.

Acceptance

  • The columns names of the uploaded file determine the schema_fields_array payload parameter(currently hard coded).

Tasks

  • Add tabular data file (currently CSV) guessing columns names mechanism.

  • Populate schema_fields_array payload parameter using guessing columns names mechanism above.

  • Write tests for checking functionality.

@mbeilin mbeilin self-assigned this Jul 6, 2020
@hannelita
Copy link
Contributor

Relates to #22

@hannelita
Copy link
Contributor

@mbeilin Would it make sense to just send the file to airflow and then let it to infer the fields? (ar further on, the types of these fields?) Or should we send the fields beforehand?

@hannelita hannelita added this to the Sprint - 20 July 2020 milestone Jul 6, 2020
@mbeilin
Copy link
Contributor Author

mbeilin commented Jul 7, 2020

@mbeilin Would it make sense to just send the file to airflow and then let it to infer the fields? (ar further on, the types of these fields?) Or should we send the fields beforehand?

@hannelita according to datapusher/xloader the detecting headers process is implemented with messytables, so I just finished implementing the similar mechanism in our aircan-connector - PR.
We can try to detect headers this way and if for any reason, it was aborted we can send schema_fields_array as empty and let pandas doing the job (pandas is the module dealing with files in aircan, correct?).

@rufuspollock rufuspollock changed the title Guess ckanext-aircan schema fields (column names). Use resource schema to set schema fields in data resource that is passed to Factory (and error if absent) Jul 7, 2020
@hannelita
Copy link
Contributor

As proposed in #1 , this is out of scope. We'll send this information on the request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants