-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kabob reactome august #2
Open
ekwhite
wants to merge
554
commits into
drlivingston:master
Choose a base branch
from
UCDenver-ccp:kabob_reactome_august
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
… to step_b_ontology_to_bio Because the package transfers the hierarchies from the ontologies into bio world.
The scripts have been segmented into allegrograph-specific, virtuoso-specific, and common-scripts folders. The allegrograph-specific scripts are ready to try currently. More work is required for the virtuoso scripts. At some point the scripts should be refactored as there is a fair amount of repetition.
Although they are downloaded as OWL files, they are converted to the ntriples format prior to loading
Must be in the base directory so it can find project.clj
…the load-request-directory
Mainly in the GGP abstraction hierarchy
To be used to model interactions in general. The INO incorporates aspects of MI, but has a continuous hierarchy to the upper level interaction concept. Use of MI_0000 has been replaced by INO_0000002.
Replaced usage of ccp ns with kbio for rules that create bio-entities
…ization This is a more natural place to run these rules, also for practical purposes missing RNAs and proteins need to be generated before the biogrid rules run
The Protein Ontology standardized namespaces for all external identifiers to the obo namespace. This negated the need for the step_aa rules which have been moved to the deprecated folder. In their place there are some new rules that make exact match statements for pr identifiers that don’t match what is produced by the file parser machinery, e.g. NCBIGene_1 in pr vs NCBI_GENE_1 produced by the file parsers.
Added pseudogene, protein-coding gene, and biological region types based on NCBI gene gene_info data. Removed old rules that were typing genes as RNAs. These will be replaced with rules in step_hcb: generating missing ggp entities.
Moved them to a new directory
…protein root class
Updated code to handle :body blocks that are SPARQL strings Removed handling for the :sparql-string block
Revised generation to static node URIs to prevent redundant assertions from different ontology files expressing similar knowledge
Removed deprecated rules Replaces use of :sparql-string in rules with :body Added ordering to taxon rules to prevent duplicate taxon restrictions from being generated Replaced use of to_be_integrated/ and under_review/ rule directories with under_construction/ Updated GO MF KR to use realizes relation Replaced usage of ccp ns with kice and kbio where appropriate
1) Implementation of a suite of static rule tests to identify common errors during rule composition. Rules are checked for variable alignment between the head, reify, and body blocks among other things. Note, the use of the :sparql-string keyword has been discontinued. The :body keyword can now be either a SPARQL string or a list of triples using Livingston’s DSL. To run the static rule test: lein test :only kabob.build.static-rule-tests Alternatively, you can run the tests individually: lein test :only kabob.build.static-rule-tests/test-rule-structure lein test :only kabob.build.static-rule-tests/test-whitespace-padded-names lein test :only kabob.build.static-rule-tests/test-rules-known-syms lein test :only kabob.build.static-rule-tests/test-rules-have-meta lein test :only kabob.build.static-rule-tests/test-duplicate-names lein test :only kabob.build.static-rule-tests/test-rules-forward-safe lein test :only kabob.build.static-rule-tests/test-rule-heads-for-expected-property-namespace lein test :only kabob.build.static-rule-tests/test-rules-for-missing-slashes-in-variables 2) Implementation of a suite of validation rules to check for representation faults within a KaBOB instance. These rules are written such that they add no new triples to the KB except for the 4 triples associated with the rule metadata when a rule is run. The validation rules are written such that zero hits is the expected result. If there are >0 results, then that is an indication of a representational issue within KaBOB that needs to be addressed. 3) Excluded redundant restrictions using a new strategy involving hashing for representing blank nodes. Redundant restrictions are created by importing ontologies where duplicate information is represented using blank nodes, e.g. restrictions with identical hasProperty and someValuesFrom fillers, but b/c they use blank nodes, they are imported as unique entities when they should instead be collapsed. 4) The rule directories have been renamed using _0_ and _1_ prefixes to more accurately encode the run order. 5) The GO MF representation was changed from MF-->has_participant-->Protein to MF<--realizes--[anonymous-process]--has_participant-->Protein 6) NCBI Taxonomy taxonomic rank concepts are now excluded from BioWorld 7) Handling was added to the Stardog build pipeline to allow for the use of named graphs, so each triple is placed in a graph named after its source file (which for rule output is named after the rule that was run to generate the triples) 8) The kice namespace is now used in the identifier set generation code (the ccp namespace had remained in use accidentally)
* Fixed redundant OWL constructs when importing ontology blank nodes * Added links from UniProt isoforms to their ‘canonical’ protein using variant_of * Added labels to every (I think) exhaustive subclass that is created by the kabob rules Note 1: There are still some nodes missing labels. Some are discontinued (NCBI) or withdrawn (HGNC) records that are linked by other sources. But many are genes/proteins from species other than human that are being brought in as part of PPIs, e.g. a human protein and a mouse protein are known to interact. The next build will restrict PPIs to just human-human interactions (I thought this was already the case but evidently it was not). Note 2: There are still some redundant restriction classes. When there is a restriction that is defined in an ontology that is also defined by one of the KaBOB rules, e.g. only_in_taxon restrictions, there will be two copies b/c of the way the URIs are currently generated. I’ll work towards collapsing these in future releases.
Changes from May and July 2018 releases
Add schema diagram
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add rules for parsing basic Reactome entities from BioPAX to KaBOB ICE.
Entities include
Continuants: proteins, small molecules, physical entities, dnas, rnas, and complexes
Occurrents: biochemical reactions, template reactions, degradations, pathways; controls, template reaction regulations, and pathway steps.
When these rules are working, there's a lot more ICE to generate.
Watch for the rules' returning triples with missing forward slashes (http:/ rather than http://) in the URIs of existing BioPAX entities. Here's an example of one in the first position:
http:/www.reactome.org/biopax/65/48887#TemplateReactionRegulation8 <http://pur
l.obolibrary.org/obo/IAO_0000142> <http://ccp.ucdenver.edu/kabob/ice/R_hQXlqE4km
e-3HAiVOIU_ZXgx9rU> .