-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kabob reactome october 3 #3
Open
ekwhite
wants to merge
563
commits into
drlivingston:master
Choose a base branch
from
UCDenver-ccp:kabob_reactome_october_3
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Kabob reactome october 3 #3
ekwhite
wants to merge
563
commits into
drlivingston:master
from
UCDenver-ccp:kabob_reactome_october_3
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Although they are downloaded as OWL files, they are converted to the ntriples format prior to loading
Must be in the base directory so it can find project.clj
…the load-request-directory
Incremental update to Overhaul
Moved them to a new directory
…protein root class
Updated code to handle :body blocks that are SPARQL strings Removed handling for the :sparql-string block
Revised generation to static node URIs to prevent redundant assertions from different ontology files expressing similar knowledge
Removed deprecated rules Replaces use of :sparql-string in rules with :body Added ordering to taxon rules to prevent duplicate taxon restrictions from being generated Replaced use of to_be_integrated/ and under_review/ rule directories with under_construction/ Updated GO MF KR to use realizes relation Replaced usage of ccp ns with kice and kbio where appropriate
1) Implementation of a suite of static rule tests to identify common errors during rule composition. Rules are checked for variable alignment between the head, reify, and body blocks among other things. Note, the use of the :sparql-string keyword has been discontinued. The :body keyword can now be either a SPARQL string or a list of triples using Livingston’s DSL. To run the static rule test: lein test :only kabob.build.static-rule-tests Alternatively, you can run the tests individually: lein test :only kabob.build.static-rule-tests/test-rule-structure lein test :only kabob.build.static-rule-tests/test-whitespace-padded-names lein test :only kabob.build.static-rule-tests/test-rules-known-syms lein test :only kabob.build.static-rule-tests/test-rules-have-meta lein test :only kabob.build.static-rule-tests/test-duplicate-names lein test :only kabob.build.static-rule-tests/test-rules-forward-safe lein test :only kabob.build.static-rule-tests/test-rule-heads-for-expected-property-namespace lein test :only kabob.build.static-rule-tests/test-rules-for-missing-slashes-in-variables 2) Implementation of a suite of validation rules to check for representation faults within a KaBOB instance. These rules are written such that they add no new triples to the KB except for the 4 triples associated with the rule metadata when a rule is run. The validation rules are written such that zero hits is the expected result. If there are >0 results, then that is an indication of a representational issue within KaBOB that needs to be addressed. 3) Excluded redundant restrictions using a new strategy involving hashing for representing blank nodes. Redundant restrictions are created by importing ontologies where duplicate information is represented using blank nodes, e.g. restrictions with identical hasProperty and someValuesFrom fillers, but b/c they use blank nodes, they are imported as unique entities when they should instead be collapsed. 4) The rule directories have been renamed using _0_ and _1_ prefixes to more accurately encode the run order. 5) The GO MF representation was changed from MF-->has_participant-->Protein to MF<--realizes--[anonymous-process]--has_participant-->Protein 6) NCBI Taxonomy taxonomic rank concepts are now excluded from BioWorld 7) Handling was added to the Stardog build pipeline to allow for the use of named graphs, so each triple is placed in a graph named after its source file (which for rule output is named after the rule that was run to generate the triples) 8) The kice namespace is now used in the identifier set generation code (the ccp namespace had remained in use accidentally)
* Fixed redundant OWL constructs when importing ontology blank nodes * Added links from UniProt isoforms to their ‘canonical’ protein using variant_of * Added labels to every (I think) exhaustive subclass that is created by the kabob rules Note 1: There are still some nodes missing labels. Some are discontinued (NCBI) or withdrawn (HGNC) records that are linked by other sources. But many are genes/proteins from species other than human that are being brought in as part of PPIs, e.g. a human protein and a mouse protein are known to interact. The next build will restrict PPIs to just human-human interactions (I thought this was already the case but evidently it was not). Note 2: There are still some redundant restriction classes. When there is a restriction that is defined in an ontology that is also defined by one of the KaBOB rules, e.g. only_in_taxon restrictions, there will be two copies b/c of the way the URIs are currently generated. I’ll work towards collapsing these in future releases.
Changes from May and July 2018 releases
Add schema diagram
add rules to extract basic entities in Reactome from BioPAX to ICE
… to ICE; later rules require these ones to be in first
add rules to extract first round of Reactome class fields from BioPAX
…from BioPAX to ICE; each rule requires information from previous rounds
add rules to extract second and third round of Reactome class fields …
…me_utility_classes_to_ice/step_caao_add_some_identifiers_for_reactome_nucleic_acids_to_ice/add_ncbi_555853_to_ice.clj, removed step_ca_add_reactome_ice/step_cac_add_reactome_class_fields_to_ice/step_caca_add_names_to_ice/add_reactome_names_to_ice.clj, renamed step_ca_add_reactome_ice/step_cab_add_reactome_main_classes_to_ice/step_cabf_add_physical_entities_to_ice/add_reactome_physical_entities_to_ice.clj to step_ca_add_reactome_ice/step_cab_add_reactome_main_classes_to_ice/step_cabf_add_physical_entities_to_ice/add_reactome_physical_entities_to_ice_1.clj, renamed step_ca_add_reactome_ice/step_cac_add_reactome_class_fields_to_ice/step_cacs_add_physical_entities_to_ice/add_reactome_physical_entities_to_ice.clj to step_ca_add_reactome_ice/step_cac_add_reactome_class_fields_to_ice/step_cacs_add_physical_entities_to_ice/add_reactome_physical_entities_to_ice_2.clj
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Rewrote ICE side rules to avoid duplicate RDF for GO-CC, GO-MF, GO-BP, and other xrefs
Built BIO side rules with no ICE but denotes
Added
_1_post_identifier_merge/step_h_ice_to_bio/step_he_reactome_ice_to_bio/step_hea_add_reactome_continuants_to_bio/step_heag_add_entity_families_to_bio_1/add_entity_families_to_bio_part_1.clj, a rule to get data for entity sets as RDF lists, and _1_post_identifier_merge/step_h_ice_to_bio/step_he_reactome_ice_to_bio/step_hea_add_reactome_continuants_to_bio/step_heah_add_entity_families_to_bio_2/process_reactome_entity_families.clj, a bit of clojure code that builds the RDF lists from the rule output of add_entity_families_to_bio_part_1.clj. This bit will need a wrapper to run properly.