-
Notifications
You must be signed in to change notification settings - Fork 9
qEndpoint Full Text Indexing
In qEndpoint, you can configure the repo_model.ttl file to generate an index for full-text or GeoSPARQL indexing.
You have multiple examples of model here, but we will describe how to add a simple node to handle this.
The prefixes used in this page are:
@prefix mdlc: <http://the-qa-company.com/modelcompiler/> .
@prefix my: <http://example.org/#> .
@prefix search: <http://www.openrdf.org/contrib/lucenesail#> .
You can describe a simple node to do text indexing like this one:
# Specify the main node
mdlc:main mdlc:node _:mainNode .
_:mainNode mdlc:type mdlc:luceneNode ;
# Describe the location of the lucene directory, you can use mdlc:parsedString for template strings
mdlc:dirLocation "${locationNative}lucene"^^mdlc:parsedString ;
# Define the language(s) indexed by the Lucene index, here "fr" (French) and "es" (Spanish) (uncomment to add)
# mdlc:luceneLang "es", "fr" ;
# Define the node's ID, this parameter is important if you are using multiple indexes, if this ID is set, you need to add
# the ID in the query
# search:indexid my:luceneIndex ;
# Define the reindex query for the lucene sail, the query should be ordered by ?s
mdlc:luceneReindexQuery "SELECT * {?s ?p ?o} order by ?s" ;
# Describe the evaluation mode of the queries, for native or endpointStore storage, use NATIVE
mdlc:luceneEvalMode "NATIVE"^^mdlc:parsedString.
For location on disk, you can use the predefined options like locationNative
for example, you can use all the predefined options here.
You can then search with the search virtual properties in your SPARQL queries:
PREFIX search: <http://www.openrdf.org/contrib/lucenesail#>
?subj search:matches [
search:query "search terms...";
search:property my:property ;
# specify the index ID of the node, mandatory if it was specified in the model.ttl file.
# search:indexid my:luceneIndex ;
search:score ?score;
search:snippet ?snippet ] .
With qEndpoint, you can config multiple Full-text indexes to have specific rules to search over them.
In you model.ttl
, you can create a Lucene node like explained in the Simple text indexing part, but you can also add filter to your node to only handle certain triples. This is done by using filters. The filters will only impact the sails during the add/remove/select operations, not during dataset indexing, you need to specify a custom mdlc:luceneReindexQuery
query to compute the index at indexing time.
Start by creating a filter node, here we will call it _:filterNode
and it will filter the node _:luceneNode
_:filterNode mdlc:type mdlc:filterNode ;
mdlc:paramFilter [
mdlc:type mdlc:typeFilterLuceneExp
] ;
mdlc:paramLink _:luceneNode .
You can describe the type of the node with the mdlc:type
predicate, you have multiple types available:
-
mdlc:typeFilterLuceneExp
: Will only pass the SPARQL queries with a Lucenesearch:matches
query.Example:
_:filterNode mdlc:type mdlc:filterNode ; mdlc:paramFilter [ mdlc:type mdlc:typeFilterLuceneExp ] ; mdlc:paramLink _:luceneNode .
-
mdlc:typeFilterLuceneGeoExp
: Will only pass the SPARQL queries with a Lucene GeoSPARQL query.Example:
_:filterNode mdlc:type mdlc:filterNode ; mdlc:paramFilter [ mdlc:type mdlc:typeFilterLuceneGeoExp ] ; mdlc:paramLink _:luceneNode .
-
mdlc:predicateFilter
: Will only pass during add/remove/get the triples with the described predicate(s)- Required param:
mdlc:typeFilterPredicate <predicates>
Example, here
my:text1
,my:text2
andmy:text3
are the filtered predicates, but you can also specify only 1 or more than 3:_:filterNode mdlc:type mdlc:filterNode ; mdlc:paramFilter [ mdlc:type mdlc:predicateFilter ; # The filtered predicates mdlc:typeFilterPredicate my:text1, my:text2, my:text3 ] ; mdlc:paramLink _:luceneNode .
- Required param:
-
mdlc:languageFilter
: Will only pass during add/remove/get the triples with a literal of a particular language, themdlc:luceneLang
parameter is faster for the Lucene nodes, it is mentionned here for custom implementations.- Required param:
mdlc:languageFilterLang "langs"
: set the filtered languages - Optional param:
mdlc:acceptNoLanguageLiterals []
: allow to pass literals without languages
Example, here
"es"
,"fr"
and"it"
are the filtered languages, but you can also specify only 1 or more than 3:_:filterNode mdlc:type mdlc:filterNode ; mdlc:paramFilter [ mdlc:type mdlc:languageFilter ; # The filtered languages mdlc:languageFilterLang "es", "fr", "it" ; # Do we accept literals without any language # mdlc:acceptNoLanguageLiterals [] ] ; mdlc:paramLink _:luceneNode .
- Required param:
-
mdlc:typeFilter
: Will only pass during add/remove/get the triples with a subject of a particular type, themdlc:multiFilterNode
node is faster and better for multiple type checks.- Required param:
mdlc:typeFilterPredicate <is_of_type>
: describe the type predicate to define the type of a subject - Required param:
mdlc:typeFilterObject <types>
: the filtered types
Example, here
my:type1
andmy:type2
are the filtered types, but you can also specify only 1 or more than 3:_:filterNode mdlc:type mdlc:filterNode ; mdlc:paramFilter [ mdlc:type mdlc:typeFilter ; # The predicate describing the type for a subject mdlc:typeFilterPredicate my:oftype ; # The filtered types mdlc:typeFilterObject my:type1, my:type2 ] ; mdlc:paramLink _:luceneNode .
- Required param:
Now we can filter our streams, but what if we want to use multiple filters? qEndpoint also has a syntax for that. It is done by using the mdlc:paramFilterAnd
and mdlc:paramFilterOr
predicates in the mdlc:paramFilter
.
Example 1
_:filterNode mdlc:type mdlc:filterNode ;
mdlc:paramFilter [
mdlc:type mdlc:typeFilterLuceneGeoExp
mdlc:paramFilterOr: [
mdlc:type mdlc:typeFilterLuceneExp
]
] ;
mdlc:paramLink _:luceneNode .
Here we are filtering all the expression not containing a GeoSPARQL query or a Full text search query, the mdlc:paramFilterOr
can contain multiple filters, the predicates are the same as with the mdlc:paramFilter
objects.
Example 2
_:filterNode mdlc:type mdlc:filterNode ;
mdlc:paramFilter [
mdlc:type mdlc:typeFilterLuceneExp
mdlc:paramFilterAnd: [
mdlc:type mdlc:predicateFilter ;
mdlc:typeFilterPredicate my:description ;
]
] ;
mdlc:paramLink _:luceneNode .
In this example, we are filtering the expressions with a full-text search and all the triples without a my:description
predicate, it can be used for example to index all the descriptions.
The boolean operators priorities as the same as in most of the programming languages.
[] mdlc:paramFilter [
mdlc:type <FILTER_A>
mdlc:paramFilterAnd: [
mdlc:type <FILTER_B>
],
mdlc:paramFilterOr: [
mdlc:type <FILTER_C>
]
].
This little example can be translated to this expression:
(FILTER_A and FILTER_B) or FILTER_C
The type filtering is important, but not optimized for multiple type checks in the same flux, to do that, you need to use a mdlc:multiFilterNode
node.
Example
_:multiTypeFilter mdlc:type mdlc:multiFilterNode ;
mdlc:typeFilterPredicate my:typeof ;
mdlc:node [
mdlc:typeFilterObject my:type1;
mdlc:node my:luceneNode1
] , [
mdlc:typeFilterObject my:type2;
mdlc:node my:luceneNode2
] .
In this example, the my:typeof
predicate is used as a typeof predicate and 2 types are selected, the my:type1
type linked with the my:luceneNode1
node and the my:type2
type linked with the my:luceneNode2
node (You can specify more than 2 types).
It can used for example to have one luceneNode indexing the clients of a company and another one indexing the products of a company, these 2 sets can be big and are obviously not overlapping.
Node chains are used if you want to combine 2 nodes together. The type is mdlc:linkedSailNode
.
example
_:lucenechain1 mdlc:type mdlc:linkedSailNode ;
mdlc:node _:lucenesail_fr ,
_:lucenesail_de ,
_:lucenesail_es .
In this example we chain 3 lucene nodes. We can imagine one is only indexing French ("fr"), the 2nd German ("de") and the 3rd Spanish ("es") literals.
In this part we are using the Cocktails dataset.
We will first use one index, then we will split this index by type and to conclude by language.
First create a simple Lucene index using (Don't forget to reindex the dataset in the control menu if you're using an already indexed dataset)
# Define main node
mdlc:main mdlc:node _:mainNode .
# Create full text search Lucene index
_:mainNode mdlc:type mdlc:luceneNode ;
# Describe the location of the lucene directory, you can use mdlc:parsedString for template strings
mdlc:dirLocation "${locationNative}lucene"^^mdlc:parsedString ;
# Define the reindex query for the lucene sail, the query should be ordered by ?s
mdlc:luceneReindexQuery "SELECT * {?s ?p ?o} order by ?s" ;
# Describe the evaluation mode of the queries, for native or endpointStore storage, use NATIVE
mdlc:luceneEvalMode "NATIVE"
This will create an index that will parse all the text literals and allow us to find them.
You can then run your queries, for example this one to find cocktails containing "Margarita" in the their labels.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX search: <http://www.openrdf.org/contrib/lucenesail#>
PREFIX cocktail: <http://vocabulary.semantic-web.at/cocktail-ontology/>
SELECT * WHERE {
?subj search:matches [
search:query "margarita" ;
search:property rdfs:label ;
] .
?subj a cocktail:Cocktail.
?subj rdfs:label ?name .
FILTER (LANG(?name) = "en")
} LIMIT 100
You can notice that we are running first the full-text search, then we remove everything that isn't a cocktail and then everything that isn't an English literal. In the next sections we will see how to use multiple indexes to don't have to do it at query time.
Using a mdlc:multiFilterNode
, we can split our index into 3 indexes, one for the 3 types cocktail:Cocktail
, cocktail:Ingredients
and cocktail:Beverages
.
@prefix my: <http://example.org/#> .
@prefix cocktail: <http://vocabulary.semantic-web.at/cocktail-ontology/> .
mdlc:main mdlc:node my:multiTypeFilter .
my:multiTypeFilter mdlc:type mdlc:multiFilterNode ;
mdlc:typeFilterPredicate rdf:type ;
mdlc:node [
mdlc:typeFilterObject cocktail:Cocktail ;
mdlc:node my:fulltextindexCocktail
] , [
mdlc:typeFilterObject cocktail:Ingredients ;
mdlc:node my:fulltextindexIngredients
] , [
mdlc:typeFilterObject cocktail::Beverages ;
mdlc:node my:fulltextindexBeverages
] .
my:fulltextindexCocktail mdlc:type mdlc:luceneNode ;
search:indexid my:fulltextindexCocktail ;
mdlc:luceneReindexQuery "SELECT * {?s ?p ?o ; a <http://vocabulary.semantic-web.at/cocktail-ontology/Cocktail>} order by ?s" ;
mdlc:dirLocation "${locationNative}fulltextindexCocktail"^^mdlc:parsedString ;
mdlc:luceneEvalMode "NATIVE".
my:fulltextindexIngredients mdlc:type mdlc:luceneNode ;
search:indexid my:fulltextindexIngredients ;
mdlc:luceneReindexQuery "SELECT * {?s ?p ?o ; a <http://vocabulary.semantic-web.at/cocktail-ontology/Ingredients>} order by ?s" ;
mdlc:dirLocation "${locationNative}fulltextindexIngredients"^^mdlc:parsedString ;
mdlc:luceneEvalMode "NATIVE".
my:fulltextindexBeverages mdlc:type mdlc:luceneNode ;
search:indexid my:fulltextindexBeverages ;
mdlc:luceneReindexQuery "SELECT * {?s ?p ?o ; a <http://vocabulary.semantic-web.at/cocktail-ontology/Beverages>} order by ?s" ;
mdlc:dirLocation "${locationNative}fulltextindexBeverages"^^mdlc:parsedString ;
mdlc:luceneEvalMode "NATIVE".
We can then run again our query
PREFIX my: <http://example.org/#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX search: <http://www.openrdf.org/contrib/lucenesail#>
PREFIX cocktail: <http://vocabulary.semantic-web.at/cocktail-ontology/>
SELECT * WHERE {
?subj search:matches [
search:query "margarita" ;
search:indexid my:fulltextindexCocktail ;
search:property rdfs:label ;
] .
# ?subj a cocktail:Cocktail.
?subj rdfs:label ?name .
FILTER (LANG(?name) = "en")
} LIMIT 100
You can see that the search of the triple to find type of the subject isn't required anymore knowing we are using the index my:fulltextindexCocktail
which only contains Cocktail.
Our type splitting done, we can then split using the literal language.
To do it, we are going to use the mdlc:languageFilterLang
property of the Lucene index. Which is lighter than using one filter per language. But we need to link the indexes, for that the mdlc:linkedSailNode
node.
@prefix my: <http://example.org/#> .
@prefix cocktail: <http://vocabulary.semantic-web.at/cocktail-ontology/> .
mdlc:main mdlc:node my:multiTypeFilter .
my:multiTypeFilter mdlc:type mdlc:multiFilterNode ;
mdlc:typeFilterPredicate rdf:type ;
mdlc:node [
mdlc:typeFilterObject cocktail:Cocktail ;
mdlc:node [
mdlc:type mdlc:linkedSailNode ;
mdlc:node my:fulltextindexCocktailFr ,
my:fulltextindexCocktailEn ,
my:fulltextindexCocktailIt
]
] , [
mdlc:typeFilterObject cocktail:Ingredients ;
mdlc:node [
mdlc:type mdlc:linkedSailNode ;
mdlc:node my:fulltextindexIngredientsFr ,
my:fulltextindexIngredientsEn ,
my:fulltextindexIngredientsIt
]
] , [
mdlc:typeFilterObject cocktail::Beverages ;
mdlc:node [
mdlc:type mdlc:linkedSailNode ;
mdlc:node my:fulltextindexBeveragesFr ,
my:fulltextindexBeveragesEn ,
my:fulltextindexBeveragesIt
]
] .
### Cocktail indexes
my:fulltextindexCocktailFr mdlc:type mdlc:luceneNode ;
search:indexid my:fulltextindexCocktailFr ;
mdlc:luceneReindexQuery "SELECT * {?s ?p ?o ; a <http://vocabulary.semantic-web.at/cocktail-ontology/Cocktail>} order by ?s" ;
mdlc:dirLocation "${locationNative}fulltextindexCocktailFr"^^mdlc:parsedString ;
mdlc:languageFilterLang "fr" ;
mdlc:luceneEvalMode "NATIVE".
my:fulltextindexCocktailEn mdlc:type mdlc:luceneNode ;
search:indexid my:fulltextindexCocktailEn ;
mdlc:luceneReindexQuery "SELECT * {?s ?p ?o ; a <http://vocabulary.semantic-web.at/cocktail-ontology/Cocktail>} order by ?s" ;
mdlc:dirLocation "${locationNative}fulltextindexCocktailEn"^^mdlc:parsedString ;
mdlc:languageFilterLang "en" ;
mdlc:luceneEvalMode "NATIVE".
my:fulltextindexCocktailIt mdlc:type mdlc:luceneNode ;
search:indexid my:fulltextindexCocktailIt ;
mdlc:luceneReindexQuery "SELECT * {?s ?p ?o ; a <http://vocabulary.semantic-web.at/cocktail-ontology/Cocktail>} order by ?s" ;
mdlc:dirLocation "${locationNative}fulltextindexCocktailIt"^^mdlc:parsedString ;
mdlc:languageFilterLang "it" ;
mdlc:luceneEvalMode "NATIVE".
### Ingredients indexes
my:fulltextindexIngredientsFr mdlc:type mdlc:luceneNode ;
search:indexid my:fulltextindexIngredientsFr ;
mdlc:luceneReindexQuery "SELECT * {?s ?p ?o ; a <http://vocabulary.semantic-web.at/cocktail-ontology/Ingredients>} order by ?s" ;
mdlc:dirLocation "${locationNative}fulltextindexIngredientsFr"^^mdlc:parsedString ;
mdlc:languageFilterLang "fr" ;
mdlc:luceneEvalMode "NATIVE".
my:fulltextindexIngredientsEn mdlc:type mdlc:luceneNode ;
search:indexid my:fulltextindexIngredientsEn ;
mdlc:luceneReindexQuery "SELECT * {?s ?p ?o ; a <http://vocabulary.semantic-web.at/cocktail-ontology/Ingredients>} order by ?s" ;
mdlc:dirLocation "${locationNative}fulltextindexIngredientsEn"^^mdlc:parsedString ;
mdlc:languageFilterLang "en" ;
mdlc:luceneEvalMode "NATIVE".
my:fulltextindexIngredientsIt mdlc:type mdlc:luceneNode ;
search:indexid my:fulltextindexIngredientsIt ;
mdlc:luceneReindexQuery "SELECT * {?s ?p ?o ; a <http://vocabulary.semantic-web.at/cocktail-ontology/Ingredients>} order by ?s" ;
mdlc:dirLocation "${locationNative}fulltextindexIngredientsIt"^^mdlc:parsedString ;
mdlc:languageFilterLang "it" ;
mdlc:luceneEvalMode "NATIVE".
### Beverages indexes
my:fulltextindexBeveragesFr mdlc:type mdlc:luceneNode ;
search:indexid my:fulltextindexBeveragesFr ;
mdlc:luceneReindexQuery "SELECT * {?s ?p ?o ; a <http://vocabulary.semantic-web.at/cocktail-ontology/Beverages>} order by ?s" ;
mdlc:dirLocation "${locationNative}fulltextindexBeveragesFr"^^mdlc:parsedString ;
mdlc:languageFilterLang "fr" ;
mdlc:luceneEvalMode "NATIVE".
my:fulltextindexBeveragesEn mdlc:type mdlc:luceneNode ;
search:indexid my:fulltextindexBeveragesEn ;
mdlc:luceneReindexQuery "SELECT * {?s ?p ?o ; a <http://vocabulary.semantic-web.at/cocktail-ontology/Beverages>} order by ?s" ;
mdlc:dirLocation "${locationNative}fulltextindexBeveragesEn"^^mdlc:parsedString ;
mdlc:languageFilterLang "en" ;
mdlc:luceneEvalMode "NATIVE".
my:fulltextindexBeveragesIt mdlc:type mdlc:luceneNode ;
search:indexid my:fulltextindexBeveragesIt ;
mdlc:luceneReindexQuery "SELECT * {?s ?p ?o ; a <http://vocabulary.semantic-web.at/cocktail-ontology/Beverages>} order by ?s" ;
mdlc:dirLocation "${locationNative}fulltextindexBeveragesIt"^^mdlc:parsedString ;
mdlc:languageFilterLang "it" ;
mdlc:luceneEvalMode "NATIVE".
We can then run again our query without the language filtering using the English index
PREFIX my: <http://example.org/#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX search: <http://www.openrdf.org/contrib/lucenesail#>
PREFIX cocktail: <http://vocabulary.semantic-web.at/cocktail-ontology/>
SELECT * WHERE {
?subj search:matches [
search:query "margarita" ;
search:indexid my:fulltextindexCocktailEn ;
search:property rdfs:label ;
] .
# ?subj a cocktail:Cocktail.
?subj rdfs:label ?name .
# FILTER (LANG(?name) = "en")
} LIMIT 100
Here we don't need to search for the language anymore.