Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-378: gaffer documentation restructure #383

Merged
merged 22 commits into from
Sep 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build-docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
run: pip install -r requirements.txt

- name: Build using MkDocs
run: mkdocs build
run: mkdocs build -s

- name: Check Links
uses: gaurav-nelson/github-action-markdown-link-check@v1
Expand Down
Empty file.
Empty file.
Empty file.
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ accumulo.user=myUser
accumulo.password=myPassword
```

When using Kerberos authentication, the username and password are not used, alternative properties are used to configure Kerberos. See the [Accumulo Kerberos guide for more information](../../gaffer2.0/accumulo-kerberos.md).
When using Kerberos authentication, the username and password are not used, alternative properties are used to configure Kerberos. See the [Accumulo Kerberos guide for more information](../../change-notes/migrating-from-v1-to-v2/accumulo-kerberos.md).

Note that if the graph does not exist, it will be created when a `Graph` object is instantiated using these properties, the schema and the graph ID (given when the graph is created in Java or via a `graphConfig.json`). In this case the user must have permission to create a table. If the graph already exists (based on the graph ID) then the user simply needs permission to read the table. For information about protecting data via setting the visibility, see [Visibilty](#visibility).

Expand Down Expand Up @@ -129,7 +129,7 @@ Note that here `elements` could be a never-ending stream of `Element`s and the a

To ingest data via bulk import, a MapReduce job is used to convert your data into files of Accumulo key-value pairs that are pre-sorted to match the distribution of data in Accumulo. Once these files are created, Accumulo moves them from their current location in HDFS to the correct directory within Accumulo's data directory. The data in them is then available for query immediately.

Gaffer provides code to make this as simple as possible. The `AddElementsFromHdfs` operation is used to bulk import data. See [AddElementsFromHdfs](../operations-guide/hdfs.md#addelementsfromhdfs) for examples.
Gaffer provides code to make this as simple as possible. The `AddElementsFromHdfs` operation is used to bulk import data. See [AddElementsFromHdfs](../../reference/operations-guide/hdfs.md#addelementsfromhdfs) for examples.

## Visibility

Expand Down Expand Up @@ -185,7 +185,7 @@ In Gaffer's `AccumuloStore` a key-package contains all the logic for:

A key-package is an implementation of the `AccumuloKeyPackage` interface. Gaffer provides two implementations: `ByteEntityKeyPackage` and `ClassicKeyPackage`. These names are essentially meaningless. The "classic" in `ClassicKeyPackage` refers to the fact that it is similar to the implementation in the first version of Gaffer (known as "Gaffer1").

Both key-packages should provide good performance for most use-cases. There will be slight differences in performance between the two for different types of query. The `ByteEntityKeyPackage` will be slightly faster if the query specifies that only out-going or in-coming edges are required. The `ClassicKeyPackage` will be faster when querying for all edges involving a pair of vertices. See the Key-Packages part of the [Accumulo Store Implementation page](../../dev/components/accumulo-store.md) for more information about these key-packages.
Both key-packages should provide good performance for most use-cases. There will be slight differences in performance between the two for different types of query. The `ByteEntityKeyPackage` will be slightly faster if the query specifies that only out-going or in-coming edges are required. The `ClassicKeyPackage` will be faster when querying for all edges involving a pair of vertices. See the Key-Packages part of the [Accumulo Store Implementation page](../../development-guide/project-structure/components/accumulo-store.md) for more information about these key-packages.

## Advanced properties

Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# Stores Guide

A Gaffer Store represents the backing database responsible for storing (or facilitating access to) a graph. Ordinarily a Store provides backing for a single graph. Stores which provide access to other stores can support multiple graphs. So far only the [Federated Store](federated.md) supports this.
A Gaffer Store represents the backing database responsible for storing (or facilitating access to) a graph. Ordinarily a Store provides backing for a single graph. Stores which provide access to other stores can support multiple graphs. So far only the [Federated Store](federated-store.md) supports this.

Gaffer currently supplies the following store implementations:

- [Map Store](map.md) - Simple in-memory store
- [Accumulo Store](accumulo.md) - [Apache Accumulo](https://accumulo.apache.org/) backed store
- [Proxy Store](proxy.md) - Delegates/forwards queries to another Gaffer REST
- [Federated Store](federated.md) - Federates queries across multiple graphs
- [Map Store](map-store.md) - Simple in-memory store
- [Accumulo Store](accumulo-store.md) - [Apache Accumulo](https://accumulo.apache.org/) backed store
- [Proxy Store](proxy-store.md) - Delegates/forwards queries to another Gaffer REST
- [Federated Store](federated-store.md) - Federates queries across multiple graphs

## Caches

Expand All @@ -19,7 +19,7 @@ Gaffer comes with three cache implementations:

The `HashMap` cache is not persistent. If using the Hazelcast instance of the Cache service be aware that once the last node shuts down, all data will be lost. This is due to the data being held in memory in a distributed system.

For information on implementing caches, see [the cache developer docs page](../../dev/components/cache.md).
For information on implementing caches, see [the cache developer docs page](../../development-guide/project-structure/components/cache.md).

### Cache configuration

Expand Down
Empty file.
Empty file.
Empty file.
Empty file.
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Edges and Entities can optionally have the following fields:
- `properties` - Properties are defined by a map of key-value pairs of property names to property types. Property types are described in the Types schema.
- `groupBy` - Allows you to specify extra properties (in addition to the element group and vertices) to use for controlling when similar elements should be grouped together and summarised. By default Gaffer uses the element group and its vertices to group similar elements together when aggregating and summarising elements.
- `visibilityProperty` - Used to specify the property to use as a visibility property when using visibility properties in your graph. If sensitive elements have a visibility property then set this field to that property name. This ensures Gaffer knows to restrict access to sensitive elements.
- `timestampProperty` - Used to specify timestamp property in your graph, so Gaffer Stores know to treat that property specially. Setting this is optional and does not affect the queries available to users. This property allows Store implementations like Accumulo to optimise the way the timestamp property is persisted. For these stores using it can have a very slight performance improvement due to the lazy loading of properties. For more information [see the timestamp section of the Accumulo Store Reference](../../reference/stores-guide/accumulo.md#timestamp).
- `timestampProperty` - Used to specify timestamp property in your graph, so Gaffer Stores know to treat that property specially. Setting this is optional and does not affect the queries available to users. This property allows Store implementations like Accumulo to optimise the way the timestamp property is persisted. For these stores using it can have a very slight performance improvement due to the lazy loading of properties. For more information [see the timestamp section of the Accumulo Store Reference](gaffer-stores/accumulo-store.md#timestamp).
- `aggregate` - Specifies if aggregation is enabled for this element group. True by default. If you would like to disable aggregation, set this to false.

These 2 optional fields are for advanced users. They can go in the Elements Schema, however we have split them out into separate Validation and Aggregation Schema files for this page, so the logic doesn't complicate the Elements schema.
Expand Down
Empty file.
Empty file.
37 changes: 37 additions & 0 deletions docs/administration-guide/where-to-run-gaffer/gaffer-docker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Gaffer Docker

The [gaffer-docker](https://github.com/gchq/gaffer-docker) repository contains
all code needed to run Gaffer using Docker.

All the files needed to get started using Gaffer in Docker are contained in the
['docker'](https://github.com/gchq/gaffer-docker/tree/develop/docker)
sub-folder.

In this directory you can find the Dockerfiles and docker compose files for
building container images for:

- [Gaffer](https://github.com/gchq/gaffer-docker/tree/develop/docker/gaffer)
- [Gaffer's REST
API](https://github.com/gchq/gaffer-docker/tree/develop/docker/gaffer-rest)
- [Gaffer's Road Traffic
Example](https://github.com/gchq/gaffer-docker/tree/develop/docker/gaffer-road-traffic-loader)
- [HDFS](https://github.com/gchq/gaffer-docker/tree/develop/docker/hdfs)
- [Accumulo](https://github.com/gchq/gaffer-docker/tree/develop/docker/accumulo)
- [Gaffer's Integration
Tests](https://github.com/gchq/gaffer-docker/tree/develop/docker/gaffer-integration-tests)
- [gafferpy Jupyter
Notebook](https://github.com/gchq/gaffer-docker/tree/develop/docker/gaffer-pyspark-notebook)
- [Gaffer's JupyterHub Options
Server](https://github.com/gchq/gaffer-docker/tree/develop/docker/gaffer-jhub-options-server)
- [Spark](https://github.com/gchq/gaffer-docker/tree/develop/docker/spark-py)

Each directory contains a README with more specific information on what these
images are for and how to build them.

Please note that some of these containers will only be useful if utilised by the
Helm Charts under Kubernetes, and may not be possible to run on their own.

## Requirements

Before you can build and run these containers you will need to install Docker or
a compatible equivalent (e.g. Podman).
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@
Below is a summary of changes that have been made in Gaffer version 2.

### Accumulo 2 Support
The Accumulo store now supports Accumulo 2 and Hadoop 3 by default, with support for Accumulo 1 and Hadoop 2 retained. See the [Accumulo Migration page](accumulo-migration.md) for more information about this change.
The Accumulo store now supports Accumulo 2 and Hadoop 3 by default, with support for Accumulo 1 and Hadoop 2 retained. See the [Accumulo Migration page](../migrating-from-v1-to-v2/accumulo-migration.md) for more information about this change.

### Federated Store Improvements
The Federated Operation was added to greatly improve flexibility of using a Federated Store.
!!! danger "Breaking change"
To migrate, please see the [Federated Store Changes page](federation-changes.md).
To migrate, please see the [Federated Store Changes page](../migrating-from-v1-to-v2/federation-changes.md).

### Cache Improvements and fixes
All "caches" within Gaffer received a lot of bug fixes which should make them significantly more stable and consistent over time. This should improve usability of FederatedStores, NamedOperations and NamedViews.
Expand All @@ -18,12 +18,12 @@ All "caches" within Gaffer received a lot of bug fixes which should make them si
### Removal of Deprecated code
All of Gaffer 1's deprecated code has been removed.
!!! danger "Breaking change"
To migrate, please see the [deprecations](deprecations.md) page.
To migrate, please see the [deprecations](../migrating-from-v1-to-v2/deprecations.md) page.

### Dependency Upgrades
Dependencies have been updated, where possible to the latest version, removing vulnerabilities.
!!! danger "Breaking change"
You will need to migrate your dependencies to be compatible with Gaffer 2's new dependency versions. Please see the [dependencies](dependencies.md) page for full details.
You will need to migrate your dependencies to be compatible with Gaffer 2's new dependency versions. Please see the [dependencies](../migrating-from-v1-to-v2/dependencies.md) page for full details.

### Federated and Proxy store fixes
A lot of bugs have been fixed that should facilitate FederatedStores with ProxyStores in them.
Expand All @@ -50,10 +50,10 @@ The HBase and Parquet stores have been removed from Gaffer in version 2. We made
There is now a maven profile that will swap dependency versions so you can build Gaffer with Java 11. The code has also been updated to build with both Java versions.

### Accumulo Kerberos Authentication Support
The Accumulo store now supports authenticating to Accumulo and HDFS using Kerberos, in addition to username/password. For more information, see the [Kerberos support page](accumulo-kerberos.md).
The Accumulo store now supports authenticating to Accumulo and HDFS using Kerberos, in addition to username/password. For more information, see the [Kerberos support page](../migrating-from-v1-to-v2/accumulo-kerberos.md).

### CSV Import and Export
Basic support for importing and exporting [CSVs](../getting-started/guide/csv.md) has been added.
Basic support for importing and exporting [CSVs](../../user-guide/query/api-querying/import-export-data.md) has been added.

### All operations can now be used within NamedOperations
Previously, `GetElementsBetweenSets` could not be used within a NamedOperation as it used `inputB`. `GetElementsBetweenSets` and `inputB` have both been deprecated and instead you should use `GetElementsBetweenSetsPairs`.
Expand Down Expand Up @@ -127,4 +127,4 @@ This will mean subgraphs added to FederatedStores can have additional operation
```

1. Schema left empty for brevity
2. This example operation enables file import. Read more in the [CSV](../getting-started/guide/csv.md) docs.
2. This example operation enables file import. Read more in the [CSV](../../user-guide/query/api-querying/import-export-data.md) docs.
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ The location of this config file can be specified using the `ACCUMULO_CLIENT_CON
Other than this file, Accumulo libraries and configuration files do not need to be installed on the Gaffer host.

### Gaffer `store.properties` configuration
In addition to the usual [Accumulo Store settings](../reference/stores-guide/accumulo.md#properties-file), these extra options must be specified for Kerberos:
In addition to the usual [Accumulo Store settings](../../administration-guide/gaffer-stores/accumulo-store.md#properties-file), these extra options must be specified for Kerberos:
```
accumulo.kerberos.enable=true
accumulo.kerberos.principal=gaffer/[email protected]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ In both serialisers, the method `deserialise(byte[])` has been marked as depreca
## Removal of Seed Matching

### [`operation.SeedMatching`](https://github.com/gchq/Gaffer/blob/gaffer2-1.21.1/core/operation/src/main/java/uk/gov/gchq/gaffer/operation/SeedMatching.java)
SeedMatching has been removed from Gaffer. This was previously used in [get](../reference/operations-guide/get.md) operations, like [`GetElements`](../reference/operations-guide/get.md#getelements), to select whether you wanted your results to contain only Elements that are the same type as the seed, or both Edges and Entities. For more info, see the Gaffer 1.X docs page on [SeedMatching](https://gchq.github.io/gaffer-doc/v1docs/getting-started/user-guide/filtering.html#seedmatching). As described in the Gaffer 1.X docs, `SeedMatching` can be replaced with a `View`. The default behaviour in Gaffer is the same as if you used `seed_matching="RELATED"`, so **if this is the case, there is no migration required**. However, if you used `seed_matching="EQUAL"`, you will need to migrate to a `View`.
SeedMatching has been removed from Gaffer. This was previously used in [get](../../reference/operations-guide/get.md) operations, like [`GetElements`](../../reference/operations-guide/get.md#getelements), to select whether you wanted your results to contain only Elements that are the same type as the seed, or both Edges and Entities. For more info, see the Gaffer 1.X docs page on [SeedMatching](https://gchq.github.io/gaffer-doc/v1docs/getting-started/user-guide/filtering.html#seedmatching). As described in the Gaffer 1.X docs, `SeedMatching` can be replaced with a `View`. The default behaviour in Gaffer is the same as if you used `seed_matching="RELATED"`, so **if this is the case, there is no migration required**. However, if you used `seed_matching="EQUAL"`, you will need to migrate to a `View`.
??? example "SeedMatching migration with EdgeSeeds"

Where SeedMatching was used to only get back Edges from EdgeSeeds
Expand Down
File renamed without changes.
25 changes: 0 additions & 25 deletions docs/dev/docker.md

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ the graph in [Neo4j syntax](https://neo4j.com/labs/apoc/4.4/export/csv/#export-d
!!! note ""
Please note that Gaffer often requires additional information about the data such as,
`:String` on the column headers to help with typing of the values. This is demonstrated below
in the raw file. There's more detail on this in the [OpenCypher documentation](../advanced-guide/import-export/csv.md#opencypher-formats).
in the raw file. There's more detail on this in the [OpenCypher documentation](../../user-guide/query/api-querying/import-export-data.md#opencypher-formats).

=== "Table"
| _id | name | age | lang | _labels | _start | _end | _type | weight |
Expand Down Expand Up @@ -165,7 +165,7 @@ set the name and short description.

The store properties file is used to configure how Gaffer will store its data. There are a few
different stores available for Gaffer, these are explained in more detail in the [reference
documentation](../../reference/stores-guide/stores.md), but by default you must provide a store
documentation](../../administration-guide/gaffer-stores/store-guide.md), but by default you must provide a store
class and a store properties class. For this example we are using an Accumulo store as it is
recommended for efficient storage and retrieval of large data volumes. It's set up requires a few
custom properties which are outlined in the following file.
Expand Down
Loading