Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ID management when using DOSDP #128

Closed
matentzn opened this issue Nov 13, 2018 · 10 comments
Closed

Improve ID management when using DOSDP #128

matentzn opened this issue Nov 13, 2018 · 10 comments
Labels
enhancement wishlist Stuff that would be nice to have

Comments

@matentzn
Copy link
Contributor

One problem I continuously stumble over now with the new data pipelines for some of the organisms is the problem of novel ids. I think we need to be able to leave the defined_class column in the DOSDP TSV file empty, and have a process that, whenever a field was left blank, populates it with a fresh id from a given id range before compiling the pattern to OWL. The reason why I wanted to stick the issue on the ODK repo rather than DOSDP is because I am not sure we want to overload DOSDP having to be forcibly aware of the already occupied id space - that could be scattered across multiple modules and/or TSV files.

My suggestion is to have a simple python script shipping with the odk that loops through all the TSV files, and, when happening over an empty field, drawing a new ID. What I am not sure yet is where and how the various id ranges should be stored (say one for patterns, one for manual).

@matentzn
Copy link
Contributor Author

@cmungall @balhoff @dosumis

@balhoff
Copy link
Member

balhoff commented Nov 13, 2018

Doesn't EBI have a server API that addresses this?

@dosumis
Copy link
Contributor

dosumis commented Nov 14, 2018

https://github.com/EBISPOT/urigen

It keeps register of IDs used - so doesn't need to look through TSV s & OWL files to check.

Don't think there's a live service anymore. We looked into running this for VFB, but IIRC, we couldn't get it working out of the box. Think it needs some updating.

@gouttegd
Copy link
Contributor

@matentzn No updated in 5 years, is that still desired?

I could consider doing something for that in my dicer tool.

@matentzn
Copy link
Contributor Author

Very low priority and should if at all be done by implementing @balhoff idea of the robot mint command (I think there is a pr for this that is 6 years old). I am also happy to just close this this has very little ROI.

@gouttegd
Copy link
Contributor

should if at all be done by implementing @balhoff idea of the robot mint command

FYI, as I mentioned here the mint command as envisioned has been implemented as part of my KGCL plugin.

But that command can only mint ID within an ontology, not within TSV files.

@matentzn
Copy link
Contributor Author

Ok, as I said, for me, this is very low priority. It is super easy to spend 2 minutes adding a few new Ids in a google sheet, and the many ways in which automatic generation of ids can go wrong (thinking of runs across branches, etc), makes it IMO not worth the hassle.

@gouttegd
Copy link
Contributor

OK, keeping this open in the “wishlist” – stuff to do in the parallel universe where we have a lot of time on our hands. ;p

@gouttegd gouttegd added wishlist Stuff that would be nice to have enhancement labels Feb 16, 2025
@matentzn
Copy link
Contributor Author

I would close it to clean the tracker and have whoever wants this reopen this with a strong case

@gouttegd
Copy link
Contributor

Since you’re the one who opened the ticket in the first place, if you don’t think yourself it is worth doing, then OK for closing.

@gouttegd gouttegd closed this as not planned Won't fix, can't repro, duplicate, stale Feb 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement wishlist Stuff that would be nice to have
Projects
None yet
Development

No branches or pull requests

4 participants