Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What tables to fill when processing a template file? #2

Closed
2 of 4 tasks
nheeren opened this issue Sep 6, 2018 · 3 comments
Closed
2 of 4 tasks

What tables to fill when processing a template file? #2

nheeren opened this issue Sep 6, 2018 · 3 comments

Comments

@nheeren
Copy link
Member

nheeren commented Sep 6, 2018

@stefanpauliuk In your comment here, you wrote of a "dataset table entry". Are you referring to the table datasets? It seems like may have overlooked this table. Can you please confirm what tables I need to fill when uploading a dataset (with custom classification)?

So far I know of:

  • classification_definitions
  • classification_items
  • data - will work on this today
  • new datasets

Are there more tables?

datasets already contains entries. Are they generated by another script or should I fill them?

@stefanpauliuk
Copy link
Member

Yes to classification_definitions, classification_items, and data
For datasets and others: It's a bit messy, unfortunately:

Datasets:
There is a master table with the dataset entries for the data that we have:
https://github.com/IndEcol/IE_data_commons/blob/master/IEDC_content_fill/IEDC_Prototype_Datasets_Batch1_Upload.xlsx
From this xlsx workbook I fill the project, data_group and dataset tables with a script.
Some of the files that you are now parsing already have a dataset entry, some don't, because they were added more recently.
My suggestion:

  1. Check if dataset table entry for dataset_name already exists.
  2. If not, insert the entire column D entries to datasets (this will be the future: data providers send a complete template that is then uploaded)

datagroups and projects:
3) If datasets belong to a data group or project, those should also be added. I suggest to keep this for manual work at the moment.

other lookup tables:
4) If datasets containt new licenses/aspects/users, etc. those would also have to be added. I suggest to install a review process here: Adding new dimensions, aspects, layers, provencance, etc. should only be allowed by the core development team to avoid chaos.
The things that could be added on the fly are, in my opinion: licences, users, projects, and datagroups. The rest should pass expert review, I think. Note: This are only the rules we install for the prototype here. If people create their on version of the IEDC they can of course do their own things, anything from creating a data commons for trade flows only from being super open about anything.

@nheeren
Copy link
Member Author

nheeren commented Sep 6, 2018

  1. & 2) sound good.
  1. If datasets belong to a data group or project, those should also be added. I suggest to keep this for manual work at the moment.

How can I determine if a file belongs to a data group or project?

  1. [...] If datasets containt new licenses/aspects/users, etc. those would also have to be added.

OK

  1. [...] Adding new dimensions, aspects, layers, provencance, etc. should only be allowed by the core development team to avoid chaos.

I propose to control this by means of mySQL user permissions. I.e. the user data_contributor will not be able to write to those tables.

  1. [...] The things that could be added on the fly are, in my opinion: licences, users, projects, and datagroups.

OK

@stefanpauliuk
Copy link
Member

Q) How can I determine if a file belongs to a data group or project?
A) A dataset belongs to a datagroup if the dataset.datagroup_id tag is not Null (foreign key: datagroups.id)
A datagroup belongs to a project if the datagroups.project_id tag is not Null (foreign key: project.id)

Datagroups are described in the datagroups, projects in the projects table.

For examples, I put the latest (this morning's IEDC_Prototype_Datasets_Batch1_Upload.xlsx) into the DropBox folder \Dropbox\G7 RECC\Data\IE_Data_Commons_Prototype\

Good idea to grant write permissions to individual tables, that makes clear what data providers can change and what not!

@nheeren nheeren closed this as completed May 8, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants