diff --git a/docs/administration-guide/gaffer-stores/simple-federated/access-control.md b/docs/administration-guide/gaffer-stores/simple-federated/access-control.md new file mode 100644 index 0000000000..abe9f7eb93 --- /dev/null +++ b/docs/administration-guide/gaffer-stores/simple-federated/access-control.md @@ -0,0 +1,134 @@ +# Simple Federation Graph Access Control + +Graphs added to a federated store can have restrictions placed on them in +addition to the standard user controls that may be in place on the data itself. + +## Restricting Graph Access + +To restrict access to a graph you must add the access controls when the graph +is added to the federated store. Once added to a store a graph's access cannot +be altered without removing and re-adding it. + +The available restrictions you can apply when adding a graph are as follows, +additional sections on this page provide more detail where needed: + +- `owner` - The user ID of the graphs owner. If not specified this will be the +ID of the user who added the graph. The owner by default does not affect the +restrictions on the graph, the user ID has no additional privileges. +- `isPublic` - Is the graph public or not, a public graph can be read by any + user. +- `readPredicate` - This is an access control predicate that is checked when +operations are performed to see if the user running the operations can read the +graph. +- `writePredicate` - This is an access control predicate that is checked when a +user is trying to modify the graph. Modification in this case refers to editing +the configured graph such as, changing the graph ID or deleting the graph from +the store; it does not effect adding or deleting data inside the graph. + +A full example of adding a graph with all these restrictions would look like: + +!!! example "" + === "Java" + + ```java + final String graphOwner = "graphOwner"; + + final AddGraph operation = new AddGraph.Builder() + .graphConfig(new GraphConfig(graphId)) + .schema(new Schema()) + .properties(new Properties()) + .owner(graphOwner) + .isPublic(true) + .readPredicate(new AccessPredicate( + new DefaultUserPredicate(graphOwner, Arrays.asList("readAuth1", "readAuth2")))) + .writePredicate(new AccessPredicate( + new DefaultUserPredicate(graphOwner, Arrays.asList("writeAuth1", "writeAuth2")))) + .build(); + ``` + + === "JSON" + + ```json + { + "class": "uk.gov.gchq.gaffer.federated.simple.operation.AddGraph", + "graphConfig": { + "graphId": "myGraph" + }, + "schema": { + "entities": {}, + "edges": {}, + "types": {} + }, + "properties": { + "gaffer.store.class": "uk.gov.gchq.gaffer.accumulostore.AccumuloStore", + "gaffer.store.properties.class": "uk.gov.gchq.gaffer.accumulostore.AccumuloProperties", + "gaffer.cache.service.class": "uk.gov.gchq.gaffer.cache.impl.HashMapCacheService" + }, + "owner": "graphOwner", + "isPublic": true, + "readPredicate": { + "class": "uk.gov.gchq.gaffer.access.predicate.AccessPredicate", + "userPredicate": { + "class": "uk.gov.gchq.gaffer.access.predicate.user.DefaultUserPredicate", + "creatingUserId": "graphOwner", + "auths": [ "readAuth1", "readAuth2" ] + } + }, + "writePredicate": { + "class": "uk.gov.gchq.gaffer.access.predicate.AccessPredicate", + "userPredicate": { + "class": "uk.gov.gchq.gaffer.access.predicate.user.DefaultUserPredicate", + "creatingUserId": "graphOwner", + "auths": [ "writeAuth1", "writeAuth2" ] + } + } + } + ``` + +## Public and Private Graphs + +Graphs added to a federated store can have a `isPublic` field added to them. +This field controls if the added graph is public which means all users can +submit requests to this graph from the federated store. A public graph will +essentially ignore any read predicate applied to it assuming all users can +see at least some data in the graph. Even if a graph is public restrictions +on the data inside it will still apply. + +If `isPublic` has been set to `false` the graph will be added as private. +A private graph will check the specified read predicate to ensure the user +has access before running a query. + +!!! note + A federated store can be configured to disallow any public graphs from being + added, please see the [store properties](./configuration.md#store-properties) + for more details. + +## Read and Write Access + +As previously mentioned read/write access can be applied to graphs added to +federated stores. + +!!! warning "Please be aware" + Reading from a graph is assumed to be running any operation on the + respective graph, this includes operations such as, `AddElements` etc. Write + access to the graph is required for modifying how it is stored in the + federated store, for example, deleting or renaming the graph. + +### Access Control Predicates + +To determine if a user has access to read or write, a predicate can be +specified that will be checked before any operation related to the graph is +executed. + +All predicates are passed through by specifying them as the `userPredicate` in +the constructor of an `AccessPredicate`. Some default predicates are available +and are as follows however, if you wish to write your own predicate it must +implement Java's [`Predicate`](https://docs.oracle.com/javase/8/docs/api/java/util/function/Predicate.html) +interface. + +- `DefaultUserPredicate` - Can be used to define a list of auth strings a user +must have to satisfy the predicate. This will also pass if the user matches the +`creatingUserId` the predicate was initialised with (this does not have the be +the same as the graph owner). +- `NoAccessUserPredicate` - Will always deny any access if used. +- `UnrestrictedAccessUserPredicate` - Will always permit access if used. diff --git a/docs/administration-guide/gaffer-stores/simple-federated/additional-info.md b/docs/administration-guide/gaffer-stores/simple-federated/additional-info.md new file mode 100644 index 0000000000..90bf8c6895 --- /dev/null +++ b/docs/administration-guide/gaffer-stores/simple-federated/additional-info.md @@ -0,0 +1,97 @@ +# Additional Information on Simple Federation + +This page contains additional information and considerations +an admin may need to know when using the federated store type. + +## How are Operations Handled? + +Gaffer operations are handled quite differently when using the federated store. +The general usage is that the operation submitted to the store will be forwarded +to the sub graph for execution. This means a user can typically use a federated +store like they would a normal store by submitting the same operation chains you +would use on any other store. + +A user has control of some aspects of federation using the options passed to the +operation. These can be used to do things like pick graphs or control the +merging, a full list of the available options are outlined in the following +table: + +| Option | Description | +| --- | --- | +| `federated.graphIds` | List of graph IDs to submit the operation to, formatted as a comma separated string e.g. `"graph1,graph2"` | +| `federated.excludedGraphIds` | List of graph IDs to exclude from the query. If this is set any graph IDs on a `federated.graphIds` option are ignored and instead, all graphs are executed on except the ones specified e.g. `"graph1,graph2"` | +| `federated.aggregateElements` | Should the element aggregator be used when merging element results. | +| `federated.forwardChain` | Should the whole operation chain be sent to the sub graph or not. If set to `false` each operation will inside the chain will be sent separately, so merging from each graph will happen after each operation instead of at the end of the chain. This will be inherently slower if turned off so is `true` by default. | + +Along with the options above, all merge classes can be overridden per query +using the same property key as you would via the store properties. Please see +the table [here](./configuration.md#store-properties) for more information. + +If you wish to submit different operations to different graphs in the same query +you can do this using the `federate.forwardChain` option. By setting this to +false on the outer operation chain the options on the operations inside it will +be honoured. An example of this can be seen below: + +!!! note + This will turn off any merging of the results at the end of the chain, the + operation chain will act like a standard chain where each operations output + is now the input of the next operation. However, merging will still happen + on each operation if more than one graph is specified for it. + +!!! example "" + This seeds for an entity from one graph and adds it into another graph. + + ```json + { + "class": "OperationChain", + "options": { + "federated.forwardChain": false + }, + "operations": [ + { + "class": "GetElements", + "options": { + "federated.graphIds": "graph1" + }, + "input": [ + { + "class": "EntitySeed", + "vertex": "1" + } + ] + }, + { + "class": "AddElements", + "options": { + "federated.graphIds": "graph2" + } + } + ] + } + ``` + +## Cache Considerations + +The federated store utilises the [Gaffer cache](../store-guide.md#caches) to store +graphs that have been added to the store. This means all features available to +normal caches are also available to the graph storage, allowing the sharing and +persisting of graphs between instances. + +The federated store will use the default cache service to store graphs in. It +will also add a standard suffix meaning if you want to share graphs you will +need to set this to something other than the graph ID (see [here](../store-guide.md#cache-service)). + +## Schema Compatibility + +When querying multiple graphs, the federated store will attempt to merge each graph's schema together. This means the schemas will need to be +compatible in order to query across them. Generally you will need to ensure +any shared groups can be merged correctly, a few examples of criteria to +consider are: + +- Any properties in a shared group defined in both schemas need to have the same + type and aggregation function. +- Any visibility properties need to be compatible or they will be removed from the + schema. +- Groups with different properties in each schema will be merged so the group has + all the properties in the merged schema. +- Any groupBy definitions need to be compatible or will be removed. diff --git a/docs/administration-guide/gaffer-stores/simple-federated/configuration.md b/docs/administration-guide/gaffer-stores/simple-federated/configuration.md new file mode 100644 index 0000000000..3bd4dfc370 --- /dev/null +++ b/docs/administration-guide/gaffer-stores/simple-federated/configuration.md @@ -0,0 +1,228 @@ +# Simple Federated Store Configuration + +!!! warning + The simple federated store is still under development, with scope to replace + the standard federated store in release 2.4.0. Some configuration options + and features may be subject to change. + +## Introduction + +The Simple Federated Store enables a user to add and query multiple Gaffer +graphs through a single endpoint/instance. Queries submitted to a federated +store are forwarded to a select set of graphs that then execute the query +locally. The results from each graph are aggregated together to form the +final result to give the appearance of coming from one graph. + +Due to its unique nature a federated store has various additional configuration +and features compared to a normal store. This page covers the different +configuration an admin can apply to this store type. Further information on +[controlling graph access](./access-control.md) and [additional considerations](./additional-info.md) +with a federated store can be found on thier respective pages. + +To get started with a federated store simply set the store class and properties +like: + +```properties +gaffer.store.class=uk.gov.gchq.gaffer.federated.simple.FederatedStore +gaffer.store.properties.class=uk.gov.gchq.gaffer.federated.simple.FederatedStoreProperties +``` + +## Store Properties + +As with a standard Gaffer graph the usual store properties are available to a +federated store; however, additional properties are available to configure the +the different aspects of federating. The table below covers store properties +specific to a federated store and their usage. + +!!! note + Many of the merge related properties are just defaults and can be overridden + by the user on a per query basis. + +| Property | Default | Description | +| --- | --- | --- | +| `gaffer.store.federated.default.graphIds` | `""` | The list of default graph IDs for if a user does not specify what graph(s) to run their query on. Takes a comma separated list of graph IDs e.g. `"graphID1,graphID2"` | +| `gaffer.store.federated.allowPublicGraphs` | `true` | Are graphs with public access allowed to be added to this store. | +| `gaffer.store.federated.default.aggregateElements` | `false` | Should queries aggregate returned Gaffer elements together using the binary operator for merging elements. False by default as it can be slower meaning results are just chained into one big list. | +| `gaffer.store.federated.merge.number.class` | `uk.gov.gchq.koryphe.impl.binaryoperator.Sum` | Default binary operator for merging [`Number`](https://docs.oracle.com/javase/8/docs/api/java/lang/Number.html) results (e.g. from a `Count` operation) from multiple graphs. | +| `gaffer.store.federated.merge.string.class` | `uk.gov.gchq.koryphe.impl.binaryoperator.StringConcat` | Default binary operator for merging [`String`](https://docs.oracle.com/javase/8/docs/api/java/lang/String.html) results from multiple graphs. | +| `gaffer.store.federated.merge.boolean.class` | `uk.gov.gchq.koryphe.impl.binaryoperator.And` | Default binary operator for merging [`Boolean`](https://docs.oracle.com/javase/8/docs/api/java/lang/Boolean.html) results from multiple graphs. | +| `gaffer.store.federated.merge.collection.class` | `uk.gov.gchq.koryphe.impl.binaryoperator.CollectionConcat` | Default binary operator for merging [`Collection`](https://docs.oracle.com/javase/8/docs/api/java/util/Collection.html) results from multiple graphs. | +| `gaffer.store.federated.merge.map.class` | `uk.gov.gchq.koryphe.impl.binaryoperator.Last` | Default binary operator for merging the values of [`Map`](https://docs.oracle.com/javase/8/docs/api/java/util/Map.html) results when two of the same keys exist from multiple graphs. | +| `gaffer.store.federated.merge.element.class` | `uk.gov.gchq.gaffer.federated.simple.merge.operator.ElementAggregateOperator` | Default binary operator for merging Iterables of Gaffer elements from multiple graphs. | + +## Merge Operators + +A key part to the federated store are the merge operators. These control how +results from multiple graphs are reduced to one result so can greatly effect the +results returned by the store. As outlined in the [store properties section](#store-properties), +these operators can be configured with defaults or overridden for a query via the +operation options using the same properties. + +Sensible defaults are in place if not specified however, you may wish to chose your +own operators to be used. The only requirement for an operator is for it to +satisfy Java's [`BinaryOperator`](https://docs.oracle.com/javase/8/docs/api/java/util/function/BinaryOperator.html) +interface, you can then specify it using the property key for the data type you +wish to use it for. + +### The Default Element Merge Operator + +The default operator used to merge Gaffer elements is unique compared to the +other operators. This operator will only be used if element aggregating is set +to "true", either by default, using the store properties, or for just the query +using the operation option `federated.aggregateElements`. + +When enabled, the default merge operator attempts to use the aggregation +functions from the merged schema of the graphs that were executed on. This +attempts to emulate how the data would have been stored in a single Gaffer graph +as entities or edges that are the same (e.g. same group and vertices) will be +merged together with their properties aggregated using the functions defined in +the schema. + +#### Considerations + +There are some considerations you may wish to know when using the element merge +operator: + +- This type of merging will be inherently slower than simply returning a chained +iterable of elements. +- The results must fit in the available memory of the federated store to be +merged. If the returned result size is too big you may experience significant +performance issues. +- The results will be deduplicated as part of this process e.g. two identical +entities or edges will be merged into one. +- Any filtering you might have specified in the `View` will only be applied +to the individual graph results, this means two results separately will +satisfy the `View` but once aggregated they may not. +- If you wish to write or use your own operator for merging elements the class +must extend the [`ElementAggregateOperator`](https://github.com/gchq/Gaffer/blob/develop/store-implementation/simple-federated-store/src/main/java/uk/gov/gchq/gaffer/federated/simple/merge/operator/ElementAggregateOperator.java). + +## Adding and Removing Graphs + +A federated stores main purpose is to hold a library of 'sub' graphs. These +graphs are stored in the Gaffer [cache](./additional-info.md#cache-considerations) so can be shared between multiple +federated stores. + +You can think of a graph that has been added to a federated store as essentially +a pointer to the real graph. This generally means all the information required +to create the graph in the first place (e.g. schema, store properties etc.) are +required to add the graph to a federated store. Because of this, a common design +pattern you may wish to adopt is to have one running Accumulo cluster to which +you can add multiple Gaffer graphs through the federated store. This means you +do not need to setup multiple Gaffer instances and can query all of the graphs +through the federated store. + +```mermaid +graph LR + A(["Federated Store"])-->B(["Gaffer Graph 1"]) + A-->C(["Gaffer Graph 2"]) + B-->D(["Accumulo"]) + C-->D +``` + +### Adding a new Graph + +To add a new graph to a federated store a unique operation is available to +federated stores called `AddGraph`. This operation lets you input the +graph config, schema and store properties for the graph letting you add a +new graph like so: + +!!! example "" + === "Java" + ```java + // Choose a graph ID for your graph + final String graphId = "myGraph"; + + // Replace the graph config, schema and properties for your use case + final AddGraph operation = new AddGraph.Builder() + .graphConfig(new GraphConfig(graphId)) + .schema(new Schema()) + .properties(new Properties()) + .build(); + ``` + + === "JSON" + Replace the graph config, schema and properties for your use case. + + ```json + { + "class": "uk.gov.gchq.gaffer.federated.simple.operation.AddGraph", + "graphConfig": { + "graphId": "myGraph" + }, + "schema": { + "entities": {}, + "edges": {}, + "types": {} + }, + "properties": { + "gaffer.store.class": "uk.gov.gchq.gaffer.accumulostore.AccumuloStore", + "gaffer.store.properties.class": "uk.gov.gchq.gaffer.accumulostore.AccumuloProperties", + "gaffer.cache.service.class": "uk.gov.gchq.gaffer.cache.impl.HashMapCacheService" + } + } + ``` + +Once a graph has been added the graph ID will become available to the store so +can be referenced when running an operation. More information on running an +operation on a sub graph and available operation options can be found on the +[following page](./additional-info.md#how-are-operations-handled). + +!!! note + Added graphs can also have access controls enforced on them, please see the + [access control guide](./access-control.md) for more information. + +### Removing a Graph + +Along with adding a graph you can also remove a graph from a federated store. By +default this will simply dereference the graph, meaning if the graph had +persistent storage the data will be left untouched. The data can then be +re-accessed at a later date by simply adding the graph back to the store. + +When removing a graph you can also opt to delete all the data as well. This +obviously means the data cannot be recovered by simply re-adding the graph at a +later date. + +!!! note + To remove a graph a user requires write access to graph, please see the + [access control guide](./access-control.md) for more information. + +To remove a graph you can use the following operation like so: + +!!! example "" + === "Java" + Remove a graph leaving the data untouched (if persistent). + + ```java + final RemoveGraph removeGraph = new RemoveGraph.Builder() + .graphId(graphId) + .build(); + ``` + + Remove a graph and delete all the data. + + ```java + final RemoveGraph removeGraph = new RemoveGraph.Builder() + .graphId(graphId) + .deleteAllData(true) + .build(); + ``` + + === "JSON" + Remove a graph leaving the data untouched (if persistent). + + ```json + { + "class": "uk.gov.gchq.gaffer.federated.simple.operation.RemoveGraph", + "graphId": "myGraph" + } + ``` + + Remove a graph and delete all the data. + + ```json + { + "class": "uk.gov.gchq.gaffer.federated.simple.operation.RemoveGraph", + "graphId": "myGraph", + "deleteAllData": true + } + ``` diff --git a/mkdocs.yml b/mkdocs.yml index 7c002f76fa..c1ccb87dbb 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -152,6 +152,10 @@ nav: - 'Store Guide': 'administration-guide/gaffer-stores/store-guide.md' - 'Accumulo Store': 'administration-guide/gaffer-stores/accumulo-store.md' - 'Federated Store': 'administration-guide/gaffer-stores/federated-store.md' + - Simple Federated Store: + - 'Configuration': 'administration-guide/gaffer-stores/simple-federated/configuration.md' + - 'Access Control': 'administration-guide/gaffer-stores/simple-federated/access-control.md' + - 'Additional Information': 'administration-guide/gaffer-stores/simple-federated/additional-info.md' - 'Map Store': 'administration-guide/gaffer-stores/map-store.md' - 'Proxy Store': 'administration-guide/gaffer-stores/proxy-store.md' - Configuration: