You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When user uploads a file with tabular data to portal (CKAN), we want to guess columns names properly, so it would be passed as part of payload to a DAG.
Acceptance
The columns names of the uploaded file determine the schema_fields_array payload parameter(currently hard coded).
Tasks
Add tabular data file (currently CSV) guessing columns names mechanism.
Populate schema_fields_array payload parameter using guessing columns names mechanism above.
Write tests for checking functionality.
The text was updated successfully, but these errors were encountered:
@mbeilin Would it make sense to just send the file to airflow and then let it to infer the fields? (ar further on, the types of these fields?) Or should we send the fields beforehand?
@mbeilin Would it make sense to just send the file to airflow and then let it to infer the fields? (ar further on, the types of these fields?) Or should we send the fields beforehand?
@hannelita according to datapusher/xloader the detecting headers process is implemented with messytables, so I just finished implementing the similar mechanism in our aircan-connector - PR.
We can try to detect headers this way and if for any reason, it was aborted we can send schema_fields_array as empty and let pandas doing the job (pandas is the module dealing with files in aircan, correct?).
rufuspollock
changed the title
Guess ckanext-aircan schema fields (column names).
Use resource schema to set schema fields in data resource that is passed to Factory (and error if absent)
Jul 7, 2020
When user uploads a file with tabular data to portal (CKAN), we want to guess columns names properly, so it would be passed as part of payload to a DAG.
Acceptance
schema_fields_array
payload parameter(currently hard coded).Tasks
Add tabular data file (currently CSV) guessing columns names mechanism.
Populate
schema_fields_array
payload parameter using guessing columns names mechanism above.Write tests for checking functionality.
The text was updated successfully, but these errors were encountered: