Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with format file extensions #19

Open
joncison opened this issue Mar 15, 2020 · 5 comments
Open

Issue with format file extensions #19

joncison opened this issue Mar 15, 2020 · 5 comments
Assignees
Labels
done - pending review Issue / check is implemented, but a review of it is needed.

Comments

@joncison
Copy link
Contributor

From edamontology/edamontology#421:

  • file_extension in EDAM must be given in lower case
  • file_extension value also appears in hasExactSynonym (and preserving the capitalisation variants, e.g. all uppercase - where these are the "canoncical" variant in use)
@joncison
Copy link
Contributor Author

@matuskalas - a small detail - do we give e.g. ".txt" "txt" or both ? (prob. both?)

@joncison joncison self-assigned this Mar 18, 2020
@joncison
Copy link
Contributor Author

joncison commented Mar 18, 2020

@albangaignard for my first foray in SPARQL, I'm tackling this query, which addresses (from above):

  • file_extension in EDAM must be given in lower case

but I notice that the pattern for the file_extension property currently allows the use of | (pipe) as delimiter between multiple values, e..g yaml|yml.

While this is compact / looks nice, it rather complicates the semantics and downstream uses: file_extension currently means "A string in which one or more commonly used file extensions for a data format are delimited by pipe character(s)." rather than simply "A commonly used file extension for a data format."

I think @matuskalas the right course is to refactor EDAM so that one extension is given per file_extension? In which case the query becomes:

  • file_extension in EDAM must be contain lower-case alphanumeric characters only.

Thoughts please!

cc @hmenager @veitveit

@joncison
Copy link
Contributor Author

PS. @albangaignard my hunch is that most or all the checks will require some Python programming, so your suggestion to use Jupyter notebooks is a very good one!

@joncison
Copy link
Contributor Author

UPDATE

I just finished the query, taking the decision that only lowercase alphanumeric characters are allowed in EDAM Format file extensions. cc @matuskalas @veitveit

This being my first foray into Python and SPARQL in case you have time @albangaignard @hmenager or @hansioan I'd much appreciate some feedback on the quality of the code, which is included here (from this Juypter notebook).

@joncison joncison added the done - pending review Issue / check is implemented, but a review of it is needed. label Mar 20, 2020
@joncison
Copy link
Contributor Author

Just added check that label or exact synonym is defined that matches the file extension, see this notebook

cc @albangaignard @hmenager

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
done - pending review Issue / check is implemented, but a review of it is needed.
Projects
None yet
Development

No branches or pull requests

1 participant