Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kabob reactome october 3 #3

Open
wants to merge 563 commits into
base: master
Choose a base branch
from

Conversation

ekwhite
Copy link

@ekwhite ekwhite commented Nov 8, 2018

Rewrote ICE side rules to avoid duplicate RDF for GO-CC, GO-MF, GO-BP, and other xrefs
Built BIO side rules with no ICE but denotes
Added
_1_post_identifier_merge/step_h_ice_to_bio/step_he_reactome_ice_to_bio/step_hea_add_reactome_continuants_to_bio/step_heag_add_entity_families_to_bio_1/add_entity_families_to_bio_part_1.clj, a rule to get data for entity sets as RDF lists, and _1_post_identifier_merge/step_h_ice_to_bio/step_he_reactome_ice_to_bio/step_hea_add_reactome_continuants_to_bio/step_heah_add_entity_families_to_bio_2/process_reactome_entity_families.clj, a bit of clojure code that builds the RDF lists from the rule output of add_entity_families_to_bio_part_1.clj. This bit will need a wrapper to run properly.

bill-baumgartner and others added 30 commits June 22, 2017 00:01
Although they are downloaded as OWL files, they are converted to the
ntriples format prior to loading
Must be in the base directory so it can find project.clj
bill-baumgartner and others added 30 commits May 3, 2018 13:28
Updated code to handle :body blocks that are SPARQL strings
Removed handling for the :sparql-string block
Revised generation to static node URIs to prevent redundant assertions
from different ontology files expressing similar knowledge
Removed deprecated rules
Replaces use of :sparql-string in rules with :body
Added ordering to taxon rules to prevent duplicate taxon restrictions
from being generated
Replaced use of to_be_integrated/ and under_review/ rule directories
with under_construction/
Updated GO MF KR to use realizes relation
Replaced usage of ccp ns with kice and kbio where appropriate
1) Implementation of a suite of static rule tests to identify common errors during rule composition. Rules are checked for variable alignment between the head, reify, and body blocks among other things. Note, the use of the :sparql-string keyword has been discontinued. The :body keyword can now be either a SPARQL string or a list of triples using Livingston’s DSL.
To run the static rule test:
lein test :only kabob.build.static-rule-tests
Alternatively, you can run the tests individually:
lein test :only kabob.build.static-rule-tests/test-rule-structure
lein test :only kabob.build.static-rule-tests/test-whitespace-padded-names
lein test :only kabob.build.static-rule-tests/test-rules-known-syms
lein test :only kabob.build.static-rule-tests/test-rules-have-meta
lein test :only kabob.build.static-rule-tests/test-duplicate-names
lein test :only kabob.build.static-rule-tests/test-rules-forward-safe
lein test :only kabob.build.static-rule-tests/test-rule-heads-for-expected-property-namespace
lein test :only kabob.build.static-rule-tests/test-rules-for-missing-slashes-in-variables

2) Implementation of a suite of validation rules to check for representation faults within a KaBOB instance. These rules are written such that they add no new triples to the KB except for the 4 triples associated with the rule metadata when a rule is run. The validation rules are written such that zero hits is the expected result. If there are >0 results, then that is an indication of a representational issue within KaBOB that needs to be addressed.

3) Excluded redundant restrictions using a new strategy involving hashing for representing blank nodes. Redundant restrictions are created by importing ontologies where duplicate information is represented using blank nodes, e.g. restrictions with identical hasProperty and someValuesFrom fillers, but b/c they use blank nodes, they are imported as unique entities when they should instead be collapsed.

4) The rule directories have been renamed using _0_ and _1_ prefixes to more accurately encode the run order.

5) The GO MF representation was changed from MF-->has_participant-->Protein to MF<--realizes--[anonymous-process]--has_participant-->Protein

6) NCBI Taxonomy taxonomic rank concepts are now excluded from BioWorld

7) Handling was added to the Stardog build pipeline to allow for the use of named graphs, so each triple is placed in a graph named after its source file (which for rule output is named after the rule that was run to generate the triples)

8) The kice namespace is now used in the identifier set generation code (the ccp namespace had remained in use accidentally)
* Fixed redundant OWL constructs when importing ontology blank nodes
* Added links from UniProt isoforms to their ‘canonical’ protein using variant_of
* Added labels to every (I think) exhaustive subclass that is created by the kabob rules

Note 1: There are still some nodes missing labels. Some are discontinued (NCBI) or withdrawn (HGNC) records that are linked by other sources. But many are genes/proteins from species other than human that are being brought in as part of PPIs, e.g. a human protein and a mouse protein are known to interact. The next build will restrict PPIs to just human-human interactions (I thought this was already the case but evidently it was not).

Note 2: There are still some redundant restriction classes. When there is a restriction that is defined in an ontology that is also defined by one of the KaBOB rules, e.g. only_in_taxon restrictions, there will be two copies b/c of the way the URIs are currently generated. I’ll work towards collapsing these in future releases.
Changes from May and July 2018 releases
add rules to extract basic entities in Reactome from BioPAX to ICE
… to ICE; later rules require these ones to be in first
add rules to extract first round of Reactome class fields from BioPAX
…from BioPAX to ICE; each rule requires information from previous rounds
add rules to extract second and third round of Reactome class fields …
…me_utility_classes_to_ice/step_caao_add_some_identifiers_for_reactome_nucleic_acids_to_ice/add_ncbi_555853_to_ice.clj, removed step_ca_add_reactome_ice/step_cac_add_reactome_class_fields_to_ice/step_caca_add_names_to_ice/add_reactome_names_to_ice.clj, renamed step_ca_add_reactome_ice/step_cab_add_reactome_main_classes_to_ice/step_cabf_add_physical_entities_to_ice/add_reactome_physical_entities_to_ice.clj to step_ca_add_reactome_ice/step_cab_add_reactome_main_classes_to_ice/step_cabf_add_physical_entities_to_ice/add_reactome_physical_entities_to_ice_1.clj, renamed step_ca_add_reactome_ice/step_cac_add_reactome_class_fields_to_ice/step_cacs_add_physical_entities_to_ice/add_reactome_physical_entities_to_ice.clj to step_ca_add_reactome_ice/step_cac_add_reactome_class_fields_to_ice/step_cacs_add_physical_entities_to_ice/add_reactome_physical_entities_to_ice_2.clj
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants