Skip to content

Commit

Permalink
Merge branch 'master' into 4103-do-not-compact-self
Browse files Browse the repository at this point in the history
  • Loading branch information
imsdu authored Aug 23, 2023
2 parents 2d931e5 + 0c5760b commit 8b42340
Show file tree
Hide file tree
Showing 12 changed files with 385 additions and 95 deletions.
173 changes: 173 additions & 0 deletions docs/src/main/paradox/docs/delta/api/views/composite-sink.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
# Composite Sinks

A Composite Sink handles the following steps of composite view indexing

* Querying for the graphs of resources in the Blazegraph common namespace of the view
* Converting the obtained graphs into a format that can be pushed to a target sink
* Finally, pushing the resources to the target sink

These steps can be implemented in different ways. In Nexus Delta, there are two kinds of Composite Sink that can be
selected via configuration.

1. @ref:[Single Composite Sink](#single-composite-sink)
2. @ref:[Batch Composite Sink](#batch-composite-sink)

## Single Composite Sink

By default, Nexus Delta will use the Single Composite Sink. This sink performs one query to the Blazegraph common namespace for each resource in the project. The queries are done in chronological order (by the `updatedAt` time of the resources).

We recommend reading through the @ref:[search configuration example use case](../../../getting-started/running-nexus/search-configuration.md#example-use-case) and the @ref[Composite View API reference](composite-view-api.md) to learn more about Composite Views.

## Batch Composite Sink

Starting with Delta 1.9, it is possible to configure Nexus Delta to use a Batch Composite Sink. This implementation of
the Composite Sink can query the Blazegraph common namespace for multiple resource IDs at the same time.

@@@ note { .warning }

We recommend to start using Composite Views with the default @ref:[Single Composite Sink](#single-composite-sink). Once
you have a good understanding of it, the Batch Composite Sink can be used to enhance the performance of your deployment.

@@@

### Configuring the Batch Composite Sink

In order to enable the Batch Composite Sink, configure the following Nexus Delta property:

`plugins.composite-views.sink-config = batch`

Furthermore, you can configure the maximum size of a batch and the maximum interval using:

`plugins.composite-views.{projection-plugin}-batch.max-elements = {max-elements}`

`plugins.composite-views.{projection-plugin}-batch.max-interval = {max-interval}`

where:

* `{projection-plugin}` is either `elasticsearch` or `blazegraph`. The batching options of a Composite Sink can be set separately for each target type.
* `{max-elements}` is the maximum number of elements to batch at once (defaults to `10`)
* `{max-interval}` is the maximum interval of time to wait for `{max-elements}` elements

### How to write a SPARQL construct query for the Batch Composite Sink

In order to use the Batch Composite Sink successfully, it is necessary to rework some aspects of a regular SPARQL
construct query. We explain the changes through an example.

### Example

Suppose we are in a situation where Composite Views are using the Single Composite Sink and have the following query:

```
PREFIX schema: <http://schema.org/>
PREFIX nxv: <https://bluebrain.github.io/nexus/vocabulary/>
CONSTRUCT {
?id nxv:name ?name ;
nxv:age ?age .
nxv:parent ?parent .
} WHERE {
BIND({resource_id} AS ?id) .
?id schema:name ?name .
OPTIONAL { ?id schema:age ?age }
OPTIONAL { ?id schema:parent ?parent . }
}
```

Using the default Single Composite Sink, Nexus Delta will query the resources Alice and Bob individually and obtain the
following n-triples from Blazegraph:

```
<http://people.com/Alice> <http://schema.org/name> <Alice>
<http://people.com/Alice> <http://schema.org/parent> <http://people.com/Bob>
```

```
<http://people.com/Bob> <http://schema.org/name> <Bob>
<http://people.com/Bob> <http://schema.org/age> <42>
```

In particular, note that when looking at the graph of Alice, we do not know the age of Bob.

The first change to introduce in order to make this query work with batches of resources is to replace
the `BIND({resource_id} AS ?id) .` with `VALUES ?id { {resources_id} }`. Nexus Delta will use this template to
replace `{resource_id}` with multiple resource in case it receives a batch of more than one element. The query is now:

```
PREFIX schema: <http://schema.org/>
PREFIX nxv: <https://bluebrain.github.io/nexus/vocabulary/>
CONSTRUCT {
?id nxv:name ?name ;
nxv:age ?age .
nxv:parent ?parent .
} WHERE {
VALUES ?id { {resource_id} } .
?id schema:name ?name .
OPTIONAL { ?id schema:age ?age }
OPTIONAL { ?id schema:parent ?parent . }
}
```

With the Batch Composite Sink enabled, if Alice and Bob are batched together, this query will result in the following
triples:

```
<http://people.com/Alice> <http://schema.org/name> <Alice>
<http://people.com/Alice> <http://schema.org/parent> <http://people.com/Bob>
<http://people.com/Bob> <http://schema.org/name> <Bob>
<http://people.com/Bob> <http://schema.org/age> <42>
```

Note how the results are the merged result of the individual queries. While we were able to query several resources
simultaneously, we are now facing a framing problem. If we try to frame `http://people.com/Alice`, its graph now
contains more information than before; it will now include the age of Bob, something that we did not request.

In order to solve this problem, we will introduce aliasing for the root resource IDs. The query will now become:

```
PREFIX schema: <http://schema.org/>
PREFIX nxv: <https://bluebrain.github.io/nexus/vocabulary/>
CONSTRUCT {
?alias nxv:name ?name ;
nxv:age ?age .
nxv:parent ?parent .
} WHERE {
VALUES ?id { {resource_id} } .
BIND(IRI(CONCAT(STR(?id), '/alias')) AS ?alias) .
?id schema:name ?name .
OPTIONAL { ?id schema:age ?age }
OPTIONAL { ?id schema:parent ?parent }
}
```

With this query, a batch query for both Alice and Bob will now yield:

```
<http://people.com/Alice/alias> <http://schema.org/name> <Alice>
<http://people.com/Alice/alias> <http://schema.org/parent> <http://people.com/Bob>
<http://people.com/Bob/alias> <http://schema.org/name> <Bob>
<http://people.com/Bob/alias> <http://schema.org/age> <42>
```

You can see that the root node of Bob's graph is now `http://people.com/Bob/alias`, while Alice's parent
is `http://people.com/Bob`. This distinction ensures that we cannot get Bob's age by looking at Alice's graph, thus
reproducing the behavior that we had with the Single Composite Sink.

Nexus Delta takes care of framing these results so that the framed documents will be the same as with the Single
Composite Sink, and will not contain any `alias` keyword. For example, for the resource `http://people.com/Alice`, Nexus
Delta will obtain its graph by looking at
the `http://people.com/Alice/alias` root node, and use the resulting graph (removing the `/alias` part) for JSON-LD
framing.

#### Summary

To use the Batch Composite Sink the following changes are necessary:

* `BIND({resource_id} AS ?id)` must become `VALUES ?id { {resource_id} }` to allow for batches of resources to be
queried from Blazegraph.
* `BIND(IRI(CONCAT(STR(?id), '/alias')) AS ?alias)` needs to be added, and the relevant (`?id`) "root nodes" replaced
by `?alias` in the `CONSTRUCT` part of the query. This is done in order to avoid any clashes between the graphs of
several resources. Do note that in case
you have several "root nodes" in the `CONSTRUCT` part of your construct query, you might need several aliases.
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
@@@ index

* @ref:[Composite sink](composite-sink.md)

@@@

# CompositeView

This view is composed by multiple `sources` and `projections`.
Expand Down Expand Up @@ -231,6 +237,9 @@ where...
execute against the intermediate Sparql space for each target resource.
- `{permission}`: String - the permission necessary to query this projection. Defaults to `views/query`.

## Batching queries to the intermediate space

The queries that projections perform to the intermediate Sparql space can be either executed per individual resource, or in batches containing multiple resources. To learn more about it, please refer to the @ref:[Composite Sink page](composite-sink.md).

## Payload

Expand Down
2 changes: 2 additions & 0 deletions storage/src/main/resources/app.conf
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ app {
storage {
# the absolute path where the files are stored
root-volume = "/tmp"
# additional path prefixes from which it is allowed to link
extra-prefixes = []
# the relative path of the protected directory once the storage bucket is selected
protected-directory = "nexus"
# permissions fixer
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,11 @@ object Rejection {
* @param name
* the storage bucket name
* @param path
* the relative path to the file
* the path to the file
*/
final case class PathAlreadyExists(name: String, path: Path)
extends Rejection(
s"The provided location inside the bucket '$name' with the relative path '$path' already exists."
s"The provided location inside the bucket '$name' with the path '$path' already exists."
)

/**
Expand All @@ -46,11 +46,11 @@ object Rejection {
* @param name
* the storage bucket name
* @param path
* the relative path to the file
* the path to the file
*/
final case class PathNotFound(name: String, path: Path)
extends Rejection(
s"The provided location inside the bucket '$name' with the relative path '$path' does not exist."
s"The provided location inside the bucket '$name' with the path '$path' does not exist."
)

/**
Expand All @@ -59,11 +59,11 @@ object Rejection {
* @param name
* the storage bucket name
* @param path
* the relative path to the file
* the path to the file
*/
final case class PathContainsLinks(name: String, path: Path)
extends Rejection(
s"The provided location inside the bucket '$name' with the relative path '$path' contains links. Please remove them in order to proceed with this call."
s"The provided location inside the bucket '$name' with the path '$path' contains links. Please remove them in order to proceed with this call."
)

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,11 @@ object StorageError {
* @param name
* the storage bucket name
* @param path
* the relative path to the file
* the path to the file
*/
final case class PathNotFound(name: String, path: Path)
extends StorageError(
s"The provided location inside the bucket '$name' with the relative path '$path' does not exist."
s"The provided location inside the bucket '$name' with the path '$path' does not exist."
)

/**
Expand All @@ -69,11 +69,11 @@ object StorageError {
* @param name
* the storage bucket name
* @param path
* the relative path to the file
* the path to the file
*/
final case class PathInvalid(name: String, path: Path)
extends StorageError(
s"The provided location inside the bucket '$name' with the relative path '$path' is invalid."
s"The provided location inside the bucket '$name' with the path '$path' is invalid."
)

/**
Expand Down
Loading

0 comments on commit 8b42340

Please sign in to comment.