Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gh-398: Where to run gaffer administration guide #406

Merged
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,21 @@

When deploying Accumulo - either as part of a Gaffer stack or as a standalone, the passwords for all the users and the instance.secret are set to default values and should be changed. The instance.secret cannot be changed once deployed as it is used in initalisation.

## Standard Deployment

The passwords can be configured in a standard deployment via the `accumulo.properties` file.
tb06904 marked this conversation as resolved.
Show resolved Hide resolved

The following table outlines the values and defaults if using the container images:

| Name | value | default value |
| -------------------- | ------------------------------- | ------------- |
| Instance Secret | `instance.secret` | "DEFAULT" |
| Tracer user | `trace.user` | "root" |
| Tracer user password | `trace.token.property.password` | "secret" |


## Helm Deployment

When deploying the Accumulo helm chart, the following values are set. If you are using the Gaffer helm chart with the Accumulo integration, the values will be prefixed with "accumulo":

| Name | value | default value |
Expand Down
97 changes: 97 additions & 0 deletions docs/administration-guide/gaffer-config/graph-metadata.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Graph Metadata Configuration

The graph configuration file is a JSON file that configures few bits of the
tb06904 marked this conversation as resolved.
Show resolved Hide resolved
Gaffer graph. Primarily it is used to set the name and description along with
any additional hooks to run before an operation chain e.g. to impose limits on
max results etc. For the example a simple graph configuration file might look
like:
tb06904 marked this conversation as resolved.
Show resolved Hide resolved

```json title="graphConfig.json"
{
"graphId": "ExampleGraph",
"description": "An example graph"
}
```

## Configuring a Standard Deployment

To change any of the values for a standard Gaffer deployment all thats needed
tb06904 marked this conversation as resolved.
Show resolved Hide resolved
is to configure the JSON file for the `graphConfig`. The key value pairs in
the file can then be configured as you wished and upon restarting the graph
tb06904 marked this conversation as resolved.
Show resolved Hide resolved
the values will be updated (assuming the file is loaded correctly).

The standard file location in the gaffer images for the file is `/gaffer/graph/graphConfig.json`

## Configuring a Helm Deployment
tb06904 marked this conversation as resolved.
Show resolved Hide resolved

Configuring the graph metadata via Helm follows a similar principal to the JSON
files however, you must use the YAML format instead for the key value pairs. The
following gives an example of how the description value can be updated via Helm.

Create a file called `graph-meta.yaml`. We will use this file to add our
description and graph ID. Changing the description is as easy as changing the
`graph.config.description` value.

```yaml
graph:
config:
description: "My graph description"
```

Upgrade your deployment using Helm to load the new file:

```bash
helm upgrade my-graph gaffer-docker/gaffer -f graph-metadata.yaml --reuse-values
```

The `--reuse-values` argument means we do not override any passwords that we set
in the initial construction.

!!! tip
You can see you new description if you to the Swagger UI and call the
`/graph/config/description` endpoint.

## Updating the Graph ID
tb06904 marked this conversation as resolved.
Show resolved Hide resolved

This may be simple or complicated depending on your store type. If you are using
a Map or Federated store, you can just set the `graph.config.graphId` value in
the same way. Though if you are using a Map Store, the graph will be emptied as
a result.

However, if you are using the Accumulo store, updating the `graphId` is a little
more complicated since the Graph Id corresponds to an Accumulo table. We have to
change the gaffer users permissions to read and write to that table.To do that
update the graph-meta.yaml file with the following contents:

=== "JSON"
Configure the `graphConfig.json` file.

```json
{
"graphId": "MyGraph",
"description": "My Graph description"
}
```

=== "YAML"
Add to a `graph-meta.yaml` or similar file and load via Helm.

```yaml
graph:
config:
graphId: "MyGraph"
description: "My Graph description"

accumulo:
config:
userManagement:
users:
gaffer:
permissions:
table:
MyGraph:
- READ
- WRITE
- BULK_IMPORT
- ALTER_TABLE
```
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ The sections below walkthrough the features of Schemas in detail and explain how

## Elements schema
The Elements schema is designed to be a high level document describing what information your Graph contains, i.e. the different kinds of edges and entities and the list of properties associated with each.
Essentially this part of the schema should just be a list of all the entities and edges in the graph.
Edges describe the relationship between a source vertex and a destination vertex.
Essentially this part of the schema should just be a list of all the entities and edges in the graph.
Edges describe the relationship between a source vertex and a destination vertex.
Entities describe a vertex. Edges describe the relationship between a source vertex and a destination vertex.
We use the term "element" to mean either an edge or an entity.

Expand Down Expand Up @@ -47,7 +47,7 @@ Edges and Entities can optionally have the following fields:
- `properties` - Properties are defined by a map of key-value pairs of property names to property types. Property types are described in the Types schema.
- `groupBy` - Allows you to specify extra properties (in addition to the element group and vertices) to use for controlling when similar elements should be grouped together and summarised. By default Gaffer uses the element group and its vertices to group similar elements together when aggregating and summarising elements.
- `visibilityProperty` - Used to specify the property to use as a visibility property when using visibility properties in your graph. If sensitive elements have a visibility property then set this field to that property name. This ensures Gaffer knows to restrict access to sensitive elements.
- `timestampProperty` - Used to specify timestamp property in your graph, so Gaffer Stores know to treat that property specially. Setting this is optional and does not affect the queries available to users. This property allows Store implementations like Accumulo to optimise the way the timestamp property is persisted. For these stores using it can have a very slight performance improvement due to the lazy loading of properties. For more information [see the timestamp section of the Accumulo Store Reference](gaffer-stores/accumulo-store.md#timestamp).
- `timestampProperty` - Used to specify timestamp property in your graph, so Gaffer Stores know to treat that property specially. Setting this is optional and does not affect the queries available to users. This property allows Store implementations like Accumulo to optimise the way the timestamp property is persisted. For these stores using it can have a very slight performance improvement due to the lazy loading of properties. For more information [see the timestamp section of the Accumulo Store Reference](../gaffer-stores/accumulo-store.md#timestamp).
- `aggregate` - Specifies if aggregation is enabled for this element group. True by default. If you would like to disable aggregation, set this to false.

These 2 optional fields are for advanced users. They can go in the Elements Schema, however we have split them out into separate Validation and Aggregation Schema files for this page, so the logic doesn't complicate the Elements schema.
Expand Down Expand Up @@ -392,6 +392,16 @@ Once the schema has been loaded into a graph the parent elements are merged into
}
```

## Helm Deployment
tb06904 marked this conversation as resolved.
Show resolved Hide resolved

The easiest way to deploy a schema file is to use helms `--set-file` option which lets you set a value from the contents of a file.
For a Helm deployment to pick up changes to a Schema, you need to run a helm upgrade:

```bash
helm upgrade my-graph gaffer-docker/gaffer --set-file graph.schema."schema\.json"=./schema.json --reuse-values
```

The `--reuse-values` argument tells helm to re-use the passwords.

## Java API

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Gaffer Images

As demonstrated in the [quickstart](../quickstart.md) its very simple to start
tb06904 marked this conversation as resolved.
Show resolved Hide resolved
up a basic in memory gaffer graph using the available Open Container Initiative
(OCI) images.

For large scale graphs with persistent storage you will want to use a different
storage backend to a Map Store; the recommended one being Accumulo. To do this a
different deployment of containers are required. This guide will run through the
containers needed for a basic Accumulo cluster and how to configure and create
custom images of Gaffer.

## Available Images

Currently there are a few different images that can be used to run a Gaffer
deployment. The main ones are outlined in the following table and are all
available on [Docker Hub](https://hub.docker.com/u/gchq).

| Image | Description |
| ----- | ----------- |
| `gchq/accumulo` | This image is a containerised deployment of [Apache Accumulo](https://accumulo.apache.org/). This was created as historically there has not been an available official image from the maintainers of Accumulo; however, there has since been an [offical image](https://github.com/apache/accumulo-docker) made available but it is not currently in use in Gaffer. |
tb06904 marked this conversation as resolved.
Show resolved Hide resolved
| `gchq/hdfs` | A custom image for running HDFS (Hadoop file system) via a container. Contains an official release of [Apache Hadoop](https://hadoop.apache.org/) which is used as the scalable data storage for Accumulo. |
| `gchq/gaffer` | This is the main container image for Gaffer that is built on on top of the `gchq/accumulo` image so includes a release of `zookeeper`, `hdfs` and `accumulo` along with the Gaffer libraries. Running this image simply runs the Accumulo instance not a Gaffer instance. |
tb06904 marked this conversation as resolved.
Show resolved Hide resolved
| `gchq/gaffer-rest` | This is the REST API image containing the files that can be used to configure the graph to connect to the chosen store, by default there are some pre-configured config files which can be overridden by a [bind-mount](#volumes-and-bind-mount) of alternatives. |

!!! note
There are a few other images available; however, they are less frequently
used or purely example images, please see the [`gaffer-docker`](https://github.com/gchq/gaffer-docker/tree/develop/docker)
repository for more details.

## Volumes and bind-mount

To change and configure the graph that is deployed you will need to override
the default files in the images by default. You can of course create a custom
image with different config files however, it can be more flexible to just
bind-mount over the current files.

To do this you will need to know the location of the files in the image you
want to override but in many cases you can mount over an entire directory
for example:

!!! example ""
The path `/custom/configs` is some path on the host system with different
config files in that can be mounted in when running the image.

```bash
docker run \
-p 8080:8080 \
-v /custom/configs/gaffer/graph:/gaffer/graph \
-v /custom/configs/gaffer/schema:/gaffer/schema \
-v /custom/configs/gaffer/store:/gaffer/store \
gchq/gaffer-rest:2.0.0
```

## Custom Images

To avoid managing a file on the host and bind-mount it, the configuration can be
baked into the image. This works well if the configuration itself is rather
static and the same across all environments.

Creating a custom image can also be useful if you want to load custom extensions
to use with Gaffer (e.g. Jars) by default.

To create a custom image simply make a new `Dockerfile` and use one of the Gaffer
images as the base image like the following:

```dockerfile
FROM gchq/gaffer-rest:latest

# Copy over the existing directory with store configs in
COPY ./custom/configs/gaffer/store /gaffer/store
```

Then build the new image using a suitable tool or just plain Docker from the
current directory like:

```bash
docker build -t my-gaffer-rest .
```

### Adding Additional Libraries

By default with the Gaffer deployment you get access to the:

- Sketches library
- Time library
- Bitmap Library
- JCS cache library

If you want more libraries than this (either one of ours of one of your own) you
will need to customise the docker images and use them in place of the defaults.

At the moment, the `gchq/gaffer-rest` image uses a runnable jar file located at
`/gaffer/jars`. When it runs it includes the `/gaffer/jars/lib` on the
classpath. There is nothing in there by default because all the dependencies are
tb06904 marked this conversation as resolved.
Show resolved Hide resolved
bundled in to the JAR. However, if you wanted to add your own jars, you can add
then to this directory like the following:

```dockerfile
FROM gchq/gaffer-rest:latest
COPY ./custom-lib:1.0-SNAPSHOT.jar /gaffer/jars/lib/
```

To add any libraries to the `gchq/gaffer` image in order to push down any extra
value objects and filters to Accumulo you have to add the jars to the
tb06904 marked this conversation as resolved.
Show resolved Hide resolved
`/opt/accumulo/lib/ext` directory:

```dockerfile
FROM gchq/gaffer:latest
COPY ./my-library-1.0-SNAPSHOT.jar /opt/accumulo/lib/ext
```
Loading
Loading