Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to extract single terms from a large ontology #93

Open
ramonawalls opened this issue Nov 1, 2018 · 4 comments
Open

How to extract single terms from a large ontology #93

ramonawalls opened this issue Nov 1, 2018 · 4 comments

Comments

@ramonawalls
Copy link
Collaborator

When you just need a few terms from a very large ontology, it does not make sense to pull the whole ontology into the repo. This is true especially of NCBI taxon. Is there a strategy to just make an import module with a few terms without having to download the whole ontology?

@stuckyb
Copy link
Owner

stuckyb commented Nov 6, 2018

So, a couple of comments. First, when you build an ontology, OntoPilot will pull any external ontologies into the build folder. I recommend explicitly not adding that folder to the repo, since it only contains build artifacts and nothing of interest beyond what is in the rest of the repo. I usually add build to the repo's .gitignore file to make this explicit.

Second, I agree it would be nifty to be able to get terms from a remote ontology without downloading the whole thing. Do you know if there is a good service for doing this? For the general problem of import module extraction, you need to be able to inspect logical axioms and their semantic context, which precludes simple download strategies. For single-term imports, though, just grabbing a relevant OWL snippet would be good enough.

@ramonawalls
Copy link
Collaborator Author

Good idea for the git ignore, but I think it was still timing out when pulling in NCBI taxon. Maybe there are subsets of that available somewhere. I'll look.

I was hoping the OWL API could pull single terms. It looks like it should be possible with the OLS API (https://www.ebi.ac.uk/ols/docs/api - scroll down to TERMS), with the iri parameter.

@stuckyb
Copy link
Owner

stuckyb commented Nov 20, 2018

One could use the OWL API to pull single terms, but since it is a low-level software development API and not a web API, it still requires access to the full ontology document.

The OLS API looks like it could be promising, but it is unfortunate that it can't return content as OWL/RDF snippets (or any format parsable by the OWL API).

So out of curiosity, are you seeing failure when attempting to download the NCBI taxon ontology, or after it is downloaded, during term extraction? If the latter, I suspect the problem might be memory limitations in the Java runtime environment, which can be adjusted when running OntoPilot. Let me know if you'd like any help in trying to solve that.

On a different note, I really wish they would modularize the NCBI taxon ontology. It is so huge as to be practically useless in many applications (e.g., it requires multiple GB of RAM just to parse).

@ramonawalls
Copy link
Collaborator Author

Realized I never answered your question above. I get the error while trying to download the NCBItaxon ontology, not during term extraction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants