Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing with a few initial oma sparql querries #3

Merged
merged 4 commits into from
May 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion check.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ prefixes=$(sparql --results=TSV --data=prefixes.ttl "PREFIX sh:<http://www.w3.or
echo "Prefixes found"

project="*"
while getopts nuhrsgmcpb: option; do
while getopts nuhrsgmcbop: option; do
case "$option" in
p) project="$OPTARG";;
u) project="uniprot";;
Expand All @@ -29,6 +29,7 @@ while getopts nuhrsgmcpb: option; do
c) project="covid";;
b) project="bgee";;
n) project="nextprot";;
o) project="oma";;
h) help; exit 0;;
*) help; exit 1;;
esac
Expand Down
3 changes: 2 additions & 1 deletion convertToOneTurtle.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@


project="uniprot"
while getopts nuhrsgmcbp: option; do
while getopts nuhrsgmcbop: option; do
case "$option" in
p) project="$OPTARG";;
u) project="uniprot";;
Expand All @@ -15,6 +15,7 @@ while getopts nuhrsgmcbp: option; do
c) project="covid";;
n) project="nextprot";;
b) project="bgee";;
o) project="oma";;
h) help; exit 0;;
*) help; exit 1;;
esac
Expand Down
20 changes: 20 additions & 0 deletions oma/01-rat-proteins.ttl
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
prefix ex: <https://sparql.omabrowser.org/.well-known/sparql-examples/>
prefix sh: <http://www.w3.org/ns/shacl#>
prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>
prefix schema:<https://schema.org/>
ex:1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't have to use numbers. You can name them e.g. ex:rat-proteins.

a sh:SPARQLSelectExecutable, sh:SPARQLExecutable ;
sh:prefixes _:sparql_examples_prefixes ;
schema:target <https://sparql.omabrowser.org/sparql/> ;
rdfs:comment """Find all Rattus norvegicus' proteins present in OMA RDF database.""" ;
sh:select """SELECT ?protein ?OMA_link
WHERE
{
?protein a orth:Protein.
?protein orth:organism ?organism.
?inTaxon rdfs:label 'in taxon'@en.
?organism ?inTaxon ?taxon.
?taxon up:scientificName 'Rattus norvegicus'.
?protein rdfs:seeAlso ?OMA_link.
}""" .

16 changes: 16 additions & 0 deletions oma/02-all-species.ttl
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
prefix ex: <https://sparql.omabrowser.org/.well-known/sparql-examples/>
prefix sh: <http://www.w3.org/ns/shacl#>
prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>
prefix schema:<https://schema.org/>
ex:2
a sh:SPARQLSelectExecutable, sh:SPARQLExecutable ;
sh:prefixes _:sparql_examples_prefixes ;
schema:target <https://sparql.omabrowser.org/sparql/> ;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can also work on the uniprot sparql endpoint and the bgee one. you can add <https://sparql.uniprot.org/sparql/> , <https://www.bgee.org/sparql/> as extra targets.

rdfs:comment """Which species are available on OMA database and their scientific names?""" ;
sh:select """SELECT ?species ?sciname WHERE
{
?species a up:Taxon.
?species up:scientificName ?sciname.
?species up:rank up:Species.
}""" .

21 changes: 21 additions & 0 deletions oma/03-ins-encoded-proteins.ttl
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
prefix ex: <https://sparql.omabrowser.org/.well-known/sparql-examples/>
prefix sh: <http://www.w3.org/ns/shacl#>
prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>
prefix schema:<https://schema.org/>
ex:3
a sh:SPARQLSelectExecutable, sh:SPARQLExecutable ;
sh:prefixes _:sparql_examples_prefixes ;
schema:target <https://sparql.omabrowser.org/sparql/> ;
rdfs:comment """Retrieve all proteins in OMA that is encoded by the INS gene and their mnemonics and evidence types from Uniprot database (federated query).""" ;
sh:select """SELECT DISTINCT ?proteinOMA ?species ?mnemonic ?evidenceType ?UniProt_URI
WHERE {
?proteinOMA a orth:Protein;
orth:organism/obo:RO_0002162/up:scientificName ?species;
rdfs:label 'INS'.
?proteinOMA lscr:xrefUniprot ?UniProt_URI.
#Search the INS gene mnemonics and evidence types from Uniprot database.
service <http://sparql.uniprot.org/sparql> {
?UniProt_URI up:mnemonic ?mnemonic;
up:existence/rdfs:label ?evidenceType. }
}""" .

31 changes: 31 additions & 0 deletions oma/04-orthologs-of-ensembl-gene.ttl
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
prefix ex: <https://sparql.omabrowser.org/.well-known/sparql-examples/>
prefix sh: <http://www.w3.org/ns/shacl#>
prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>
prefix schema:<https://schema.org/>
ex:4
a sh:SPARQLSelectExecutable, sh:SPARQLExecutable ;
sh:prefixes _:sparql_examples_prefixes ;
schema:target <https://sparql.omabrowser.org/sparql/> ;
rdfs:comment """Retrieve all genes that are orthologous to ENSLACG00000002497 Ensembl gene (identifier)""" ;
sh:select """select ?protein2 ?OMA_LINK
where {
#The three that contains Orthologs. The leafs are proteins.
#This graph pattern defines the relationship protein1 is Orthologs to protein2
?cluster a orth:OrthologsCluster.
?cluster orth:hasHomologousMember ?node1.
?cluster orth:hasHomologousMember ?node2.
?node2 orth:hasHomologousMember* ?protein2.
?node1 orth:hasHomologousMember* ?protein1.
########

#Specify the protein to look for its orthologs
?protein1 sio:SIO_010079/lscr:xrefEnsemblGene ensembl:ENSLACG00000002497.
########

#The OMA link to the second protein
?protein2 rdfs:seeAlso ?OMA_LINK.
########

filter(?node1 != ?node2)
}""" .

31 changes: 31 additions & 0 deletions oma/05-paralogs-of-ensembl-gene.ttl
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
prefix ex: <https://sparql.omabrowser.org/.well-known/sparql-examples/>
prefix sh: <http://www.w3.org/ns/shacl#>
prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>
prefix schema:<https://schema.org/>
ex:5
a sh:SPARQLSelectExecutable, sh:SPARQLExecutable ;
sh:prefixes _:sparql_examples_prefixes ;
schema:target <https://sparql.omabrowser.org/sparql/> ;
rdfs:comment """Retrieve all genes that are paralogous to ENSG00000244734 Ensembl gene (identifier).""" ;
sh:select """select ?protein2 ?OMA_LINK
where {
#The three that contains paralogs. The leafs are proteins.
#This graph pattern defines the relationship protein1 is paralogous to protein2
?cluster a orth:ParalogsCluster.
?cluster orth:hasHomologousMember ?node1.
?cluster orth:hasHomologousMember ?node2.
?node2 orth:hasHomologousMember* ?protein2.
?node1 orth:hasHomologousMember* ?protein1.
########

#Specify the protein to look for its paralogs
?protein1 sio:SIO_010079/lscr:xrefEnsemblGene ensembl:ENSG00000244734.
########

#The OMA link to the second protein
?protein2 rdfs:seeAlso ?OMA_LINK.
########

filter(?node1 != ?node2)
}""" .

23 changes: 23 additions & 0 deletions oma/06-paralogs-with-uniprot-xrefs.ttl
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
prefix ex: <https://sparql.omabrowser.org/.well-known/sparql-examples/>
prefix sh: <http://www.w3.org/ns/shacl#>
prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>
prefix schema:<https://schema.org/>
ex:6
a sh:SPARQLSelectExecutable, sh:SPARQLExecutable ;
sh:prefixes _:sparql_examples_prefixes ;
schema:target <https://sparql.omabrowser.org/sparql/> ;
rdfs:comment """Retrieve all genes that are paralogous to HUMAN00529 OMA protein (identifier) and their cross-reference links to OMA and Uniprot.""" ;
sh:select """select ?protein2 ?Uniprot_link
where {
?cluster a orth:ParalogsCluster.
?cluster orth:hasHomologousMember ?node1.
?cluster orth:hasHomologousMember ?node2.
?node2 orth:hasHomologousMember* ?protein2.
?node1 orth:hasHomologousMember* ?protein1.
?protein1 a orth:Protein.
?protein1 dc:identifier 'HUMAN00529'.
?protein2 a orth:Protein.
?protein2 lscr:xrefUniprot ?Uniprot_link.
filter(?node1 != ?node2)
}""" .

23 changes: 23 additions & 0 deletions oma/07-orthologs-with-uniprot-xrefs.ttl
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
prefix ex: <https://sparql.omabrowser.org/.well-known/sparql-examples/>
prefix sh: <http://www.w3.org/ns/shacl#>
prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>
prefix schema:<https://schema.org/>
ex:7
a sh:SPARQLSelectExecutable, sh:SPARQLExecutable ;
sh:prefixes _:sparql_examples_prefixes ;
schema:target <https://sparql.omabrowser.org/sparql/> ;
rdfs:comment """Retrieve all genes that are orthologous to HUMAN22169 OMA protein (identifier) and their cross-reference links to OMA and Uniprot.""" ;
sh:select """select ?protein2 ?Uniprot_link
where {
?cluster a orth:OrthologsCluster.
?cluster orth:hasHomologousMember ?node1.
?cluster orth:hasHomologousMember ?node2.
?node2 orth:hasHomologousMember* ?protein2.
?node1 orth:hasHomologousMember* ?protein1.
?protein1 a orth:Protein.
?protein1 dc:identifier 'HUMAN22169'.
?protein2 a orth:Protein.
?protein2 lscr:xrefUniprot ?Uniprot_link.
filter(?node1 != ?node2)
}""" .

26 changes: 26 additions & 0 deletions oma/08-rabbit-apoci-orthologs.ttl
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
prefix ex: <https://sparql.omabrowser.org/.well-known/sparql-examples/>
prefix sh: <http://www.w3.org/ns/shacl#>
prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>
prefix schema:<https://schema.org/>
ex:8
a sh:SPARQLSelectExecutable, sh:SPARQLExecutable ;
sh:prefixes _:sparql_examples_prefixes ;
schema:target <https://sparql.omabrowser.org/sparql/> ;
rdfs:comment """Retrieve all genes per species that are orthologous to Rabbit's APOCI or APOC1 gene and their cross-reference links to OMA and Uniprot including the corresponding Ensembl gene identifier.""" ;
sh:select """select ?protein1 ?protein2 ?geneName2 ?species2 ?Prot2_uniprot ?prot2_ensemblGeneId
where {
?cluster a orth:OrthologsCluster.
?cluster orth:hasHomologousMember ?node1.
?cluster orth:hasHomologousMember ?node2.
?node2 orth:hasHomologousMember* ?protein2.
?node1 orth:hasHomologousMember* ?protein1.
?protein1 a orth:Protein;
orth:organism/obo:RO_0002162/up:scientificName 'Oryctolagus cuniculus';
rdfs:label 'APOCI'.
?protein2 a orth:Protein;
lscr:xrefUniprot ?Prot2_uniprot;
sio:SIO_010079/lscr:xrefEnsemblGene/dc:identifier ?prot2_ensemblGeneId;
rdfs:label ?geneName2;
orth:organism/obo:RO_0002162/up:scientificName ?species2.
filter(?node1 != ?node2)
}""" .
25 changes: 25 additions & 0 deletions oma/09-rabbit-orthologs-of-mouse-homoglobinY.ttl
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
prefix ex: <https://sparql.omabrowser.org/.well-known/sparql-examples/>
prefix sh: <http://www.w3.org/ns/shacl#>
prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>
prefix schema:<https://schema.org/>
ex:9
a sh:SPARQLSelectExecutable, sh:SPARQLExecutable ;
sh:prefixes _:sparql_examples_prefixes ;
schema:target <https://sparql.omabrowser.org/sparql/> ;
rdfs:comment """Retrieve all Rabbit's proteins encoded by genes that are orthologous to Mouses's hemoglobin Y gene and their cross-reference links to Uniprot.""" ;
sh:select """select distinct ?MOUSE_PROTEIN ?RABIT_PROTEIN ?MOUSE_UNIPROT_XREF ?RABIT_UNIPROT_XREF
where {
?cluster a orth:OrthologsCluster.
?cluster orth:hasHomologousMember ?node1.
?cluster orth:hasHomologousMember ?node2.
?node2 orth:hasHomologousMember* ?RABIT_PROTEIN.
?node1 orth:hasHomologousMember* ?MOUSE_PROTEIN.
?MOUSE_PROTEIN a orth:Protein.
?MOUSE_PROTEIN orth:organism/obo:RO_0002162/up:scientificName 'Mus musculus';
rdfs:label 'HBB-Y';
lscr:xrefUniprot ?MOUSE_UNIPROT_XREF.
?RABIT_PROTEIN a orth:Protein.
?RABIT_PROTEIN orth:organism/obo:RO_0002162/up:scientificName 'Oryctolagus cuniculus' .
?RABIT_PROTEIN lscr:xrefUniprot ?RABIT_UNIPROT_XREF.
filter(?node1 != ?node2)
}""" .
26 changes: 26 additions & 0 deletions oma/10-paralogs-in-human-of-hbb.ttl
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
prefix ex: <https://sparql.omabrowser.org/.well-known/sparql-examples/>
prefix sh: <http://www.w3.org/ns/shacl#>
prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>
prefix schema:<https://schema.org/>
ex:10
a sh:SPARQLSelectExecutable, sh:SPARQLExecutable ;
sh:prefixes _:sparql_examples_prefixes ;
schema:target <https://sparql.omabrowser.org/sparql/> ;
rdfs:comment """Retrieve all human proteins that are paralogous to the HBB gene and their UniProt cross-references.""" ;
sh:select """select distinct ?PROTEIN_HBB ?IS_PARALOGOUS_TO_PROTEIN ?PARALOG_GENE_LABEL ?HBB_UNIPROT_XREF ?PARALOG_UNIPROT_XREF
where {
?cluster a orth:OrthologsCluster.
?cluster orth:hasHomologousMember ?node1.
?cluster orth:hasHomologousMember ?node2.
?node2 orth:hasHomologousMember* ?PROTEIN_HBB.
?node1 orth:hasHomologousMember* ?IS_PARALOGOUS_TO_PROTEIN.
?PROTEIN_HBB a orth:Protein ;
orth:organism/obo:RO_0002162/up:scientificName 'Homo sapiens';
rdfs:label 'HBB';
lscr:xrefUniprot ?HBB_UNIPROT_XREF .
?IS_PARALOGOUS_TO_PROTEIN a orth:Protein;
orth:organism/obo:RO_0002162/up:scientificName 'Homo sapiens' ;
lscr:xrefUniprot ?PARALOG_UNIPROT_XREF ;
rdfs:label ?PARALOG_GENE_LABEL .
filter(?node1 != ?node2)
}""" .
36 changes: 36 additions & 0 deletions oma/11-percentage-of-proteins-with-paralogs.ttl
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
prefix ex: <https://sparql.omabrowser.org/.well-known/sparql-examples/>
prefix sh: <http://www.w3.org/ns/shacl#>
prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>
prefix schema:<https://schema.org/>
ex:11
a sh:SPARQLSelectExecutable, sh:SPARQLExecutable ;
sh:prefixes _:sparql_examples_prefixes ;
schema:target <https://sparql.omabrowser.org/sparql/> ;
rdfs:comment """The percentage of proteins in Drosophila melanogaster that has at least one paralogous sequence (protein).""" ;
sh:select """select (((xsd:float(?num_paralogy)*100)/xsd:float(?total)) as ?result)
where {
{
select (count(distinct ?PROTEIN) as ?num_paralogy )
where {
?cluster a orth:ParalogsCluster.
?cluster orth:hasHomologousMember ?node1.
?cluster orth:hasHomologousMember ?node2.
?node2 orth:hasHomologousMember* ?PROTEIN.
?node1 orth:hasHomologousMember* ?IS_PARALOGOUS_TO_PROTEIN.
?PROTEIN a orth:Protein.
?PROTEIN orth:organism/obo:RO_0002162/up:scientificName ?species.
?IS_PARALOGOUS_TO_PROTEIN a orth:Protein.
?IS_PARALOGOUS_TO_PROTEIN orth:organism/obo:RO_0002162/up:scientificName ?species .
values(?species ){( 'Drosophila melanogaster' )}
filter(?node1 != ?node2)
}
}
{
select (count(distinct ?protein_total) as ?total)
where {
?protein_total a orth:Protein .
?protein_total orth:organism/obo:RO_0002162/up:scientificName ?species .
values(?species ){( 'Drosophila melanogaster' )}
}
}
}""" .
22 changes: 22 additions & 0 deletions oma/12-orthologs-between-two-species.ttl
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
prefix ex: <https://sparql.omabrowser.org/.well-known/sparql-examples/>
prefix sh: <http://www.w3.org/ns/shacl#>
prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>
prefix schema:<https://schema.org/>
ex:12
a sh:SPARQLSelectExecutable, sh:SPARQLExecutable ;
sh:prefixes _:sparql_examples_prefixes ;
schema:target <https://sparql.omabrowser.org/sparql/> ;
rdfs:comment """Retrieve all orthologs between mouse and rabbit, together with their HOG id""" ;
sh:select """select distinct ?MOUSE_PROTEIN ?RABIT_PROTEIN ?HOG
where {
?HOG a orth:OrthologsCluster ;
orth:hasHomologousMember ?node1 ;
orth:hasHomologousMember ?node2 .
?node2 orth:hasHomologousMember* ?RABIT_PROTEIN.
?node1 orth:hasHomologousMember* ?MOUSE_PROTEIN.
?MOUSE_PROTEIN a orth:Protein ;
orth:organism/obo:RO_0002162/up:scientificName 'Mus musculus'.
?RABIT_PROTEIN a orth:Protein ;
orth:organism/obo:RO_0002162/up:scientificName 'Oryctolagus cuniculus' .
filter(?node1 != ?node2)
}""" .
25 changes: 25 additions & 0 deletions oma/13-hog-members-at-level-from-query-protein.ttl
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
prefix ex: <https://sparql.omabrowser.org/.well-known/sparql-examples/>
prefix sh: <http://www.w3.org/ns/shacl#>
prefix rdfs:<http://www.w3.org/2000/01/rdf-schema#>
prefix schema:<https://schema.org/>
ex:13
a sh:SPARQLSelectExecutable, sh:SPARQLExecutable ;
sh:prefixes _:sparql_examples_prefixes ;
schema:target <https://sparql.omabrowser.org/sparql/> ;
rdfs:comment """Retrieve all proteins belongong to the Hierarchical Orthologous Group (HOG) at the level 'Vertebrata' to which humans' CDIN1 gene belong, together with their gene name symbol if available.""" ;
sh:select """select distinct ?HOG ?MEMBER ?GENE_LABEL
where {
?HOG a orth:OrthologsCluster ;
orth:hasHomologousMember ?node1 ;
orth:hasTaxonomicRange ?taxRange .
?taxRange orth:taxRange 'Vertebrata' .
?node1 orth:hasHomologousMember* ?query ;
orth:hasHomologousMember* ?MEMBER .
?MEMBER a orth:Protein .
OPTIONAL {
?MEMBER rdfs:label ?GENE_LABEL .
}
?query a orth:Protein ;
orth:organism/obo:RO_0002162/up:scientificName 'Homo sapiens';
rdfs:label 'CDIN1'.
}""" .