SEACRIFOG

This is a tool for exploring the inventory of carbon-related observation infrastructure. There are numerous metadata repositories describing, and linking to, datasets related to carbon measurement in some way or another. These datasets are rich, but not easily discovered by existing search tools such as Google Search.

The prototype (currently available at https://seacrifog.saeon.ac.za) is aimed at providing an interactive overview of the infrastructure that supports carbon measurements. Users can select/deselect various elements of the carbon observation infrastructure, which serves the dual purpose of providing detailed information on individual, selected components of the system, and also constraining search criteria that can be applied against various organizations’ metadata repositories across the world (providing these organizations make their repositories electronically searchable, which many do).

The prototype consists of a pair of software applications:

A long running HTTP server that provides a publicly available API for interacting with the data representing the carbon observation platform model, and that acts as an adapter for specifying metadata-searches constrained by some selection of the platform entities
A browser client (website) that provides a richly interactive UI for interacting with the API.

The browser client is tightly coupled with the API logic. The API, however, can stand as a useful publicly available service in it’s own right.

Tech stack

Database
- PostGIS
API
- Node.js (server-side JavaScript framework)
- Express (web application framework)
- GraphQL (express-graphql)
- Node Postgres (database adapter)
Browser client
- ESNext (Babel, Webpack pre-compilation and bundling)
- React
- Apollo Client (GraphQL provider)
- React-MD (MIT licensed Material Design component library implementation)

Data model

API

The API provides HTTP endpoints, and a GraphQL interface. For the most part the HTTP endpoints are just stubs - they don't provide any real value at this point, but are instead a proof of concept that a GraphQL and RESTful API can share the data access layer completely (so it's fairly straightforward to provide both).

Using the API

The GraphQL API can be consumed via standard HTTP requests, with the request body a string representing a valid GraphQL query. A GraphQL IDE is available HERE. Below are some examples on how to fetch site-data from the API

# Fetch all sites that are located within the Africa region
curl -X POST https://api.seacrifog.saeon.ac.za/graphql -H "Content-Type: application/json" -d '{ "query": "{ sites { id name xyz } } "}'

# The above cURL command is implicitly the same as specifying the bounding box of "POLYGON((-26 -40,-26 38,64 38,64 -40,-26 -40))" . i.e. this cURL command should give the same results as the first one:
curl -X POST https://api.seacrifog.saeon.ac.za/graphql -H "Content-Type: application/json" -d '{ "query": "{ sites( extent: \"POLYGON((-26 -40,-26 38,64 38,64 -40,-26 -40))\") { id name xyz } } "}'

# To specify a different extent - i.e. the whole planet, a suitable extent can be specified ("POLYGON((-180 -90, -180 90, 180 90, 180 -90, -180 -90))"):
curl -X POST https://api.seacrifog.saeon.ac.za/graphql -H "Content-Type: application/json" -d '{ "query": "{ sites( extent: \"POLYGON((-180 -90, -180 90, 180 90, 180 -90, -180 -90))\") { id name xyz } } "}'

For these example, the contract of the API is such that the extent argument accepts text that is valid WKT. This is validated. It's difficult to validate the projection used. So the contract is that projection 4326 is the correct projection. WKT of a different projection will give strange results.

Integrations

Integrations need to be specified by a user in two places. These are:

Logic for polling network/site information from an endpoint - this is currently in the form of a JavaScript function that is executed on a scheduled interval. An example of the integration with ICOS is included in the source code. Currently the source code of the API needs to be adjusted to include further integrations - but this is a straightforward change to make in the future.
Search logic needs to be specified per organization as a JavaScript function - executors. An example of the function contract is included in the source code. These functions are executed as child processes to the main Node.js process. Currently only JavaScript executors are supported, but it would be fairly straightforward to allow for interoperability between the API and executors in a variety of programming languages. To add a new executor, add an appropriate function to the source code and then redeploy the application.

Data access layer

Data access is directly via SQL using the Node Postgres PostgreSQL client, with a thin wrapper over the query functionality to handle connection pooling (hopefully) correctly. GraphQL APIs require request level batching optimization even from the very beginning - due to the logic of how GraphQL queries are resolved - this is implemented as is typically done via the DataLoader library. All future work on the data access layer needs to implement database queries via this pattern - there are many references WRT to how to use DataLoader in the context of this project.

Client

The client is an SPA (Single Page Application), such is typical of React.js client apps. Architecturally, the client is organized conceptually of 'pages', each page comprising one or more 'modules'. Observational infrastructure is organized according to entity 'class'. For each entity class there is a page that lists all entities of that type (a list/explorer page), and an overview page that allows for seeing and editing a single entity. For example, all the entities of type Variable can be found on the HTTP path /variables, listed and searchable in a table. A single variable can be viewed and edited on the /variable/:id path. There is an exception - the /sites route displays a map of sites, along with proof-of-concept visualization charts. Individual sites can be edited on the /networks/:id path (sites of a particular network can be edited). Below is a representation of the site map:

.
├── /sites
├── /networks
│   └── /networks/:id
├── /variables
│   └── /variables/:id
├── /protocols
│   └── /protocols/:id
└── /search-results

Modules

The concept of modules WRT the client refers to reusable react components. There is no definite difference between a component and a module in the context of SEACRIFOG. Essentially at some point a component is considered large enough to be a module, or sometimes modules export a number of related components. These are defined in client/src/modules.

Atlas

The map is provided by OpenLayers 6, utilizing an API provided by a thin React.js wrapper library - @saeon/ol-react - authored by SAEON (at the time of writing there are no well-maintained OpenLayers 6 React.js wrapping libraries) and made available as MIT-licensed open source code. OpenLayers in the context of a JavaScript application is just a single object olMap. This object keeps it's own internal state and handles interactions internally. The @saeon/ol-react wrapper layer essentially provides the means of mapping React state to olMap internal state. This is achieved via using the ECMAScript Proxy objects API. Note that this is incompatible with Internet Explorer, and not possible to polyfill. This tool as it currently exists should, however, work on Internet Explorer 11 and upwards only because no advanced layer management is used. This will obviously not be the case with further development. In addition to the layer proxy, the Atlas module provides a means of selecting/deselecting map features, and also for specifying layers.

Throughout the client the @saeon/ol-react component is used directly. The Atlas module consists of map-related exports that are reused wherever maps are shown (these maps use the same layers, styles, configurable sources, etc.).

DataMutation

A simple component that wraps Apollo Client's useMutation hook.

DataQuery

A simple component that wraps Apollo Client's useQuery hook.

EditorPage

A collection of components that are the basis of the 'editor' pages (/networks/:id, /variables/:id, and /protocols/:id). The components include headers, input field formatters, etc.

Typically web forms are bound to some model - often referred to as 'form model binding'. This conceptually allows for representation of some table/object state as an appropriate input field. Similarly, this concept is utilized in SEACRIFOG. All the edit pages make use of UI logic to draw editable forms from JavaScript object (and provide a means of saving them to the database via GraphQL mutations).

ExplorerPage

A collection of components that are the basis of the list/explorer pages (/networks, /variables, and /protocols). The components include headers, buttons, user-feedback messages, etc.

Layout

A collection of components that are used to draw the SPA. These include a <Footer /> component that is used on most pages as well as navigation-related components that are typically only used once as 'parents' to other components used throughout the application. HTML path-based routing is handled by the react-router library.

Global state management & Search module

State is managed in three ways across the application:

Locally: state that is localized to individual components is achieved via stateful component (class components) or by using the newer React Hooks API
Explicitly: Stateful contexts are typically provided across groups of components via well-known React architectural patterns including explicitly passing props down component trees, or the render props pattern
Globally via the context API

A single global state module is used to keep track of user interactions across the app (selecting/deselecting of items). As entities are toggled a background search is performed for all currently selected search criteria - the results are stored in client memory.

SharedComponents

For the most part, components are used directly as provided by the React-MD library - already a significant amount of work in terms of crafting reusable components! However there are a few cases in this UI that 'grouped element trees' are reused in multiple places throughout the application. These include:

User-feedback messages (kept in a single place for consistency)
A controlled table that supports searching, sorting, and selecting rows (controlled meaning that state is handled by a parent component, so that selecting rows can update the global state)
A filterable list of items that can be selected/deselected - also controlled
The Side filter component used throughout the application. This component combines many instances of the DropdownSelect component, along with controlling callbacks to update the global state module. This component is used on most pages - it provides direct access to the the current global state in terms of what is being filtered
ChartStateManagement - Interactive charts are shown as a proof of concept. The API currently requires that management of state is done via context
A form component - simple to place anywhere in the component tree, and provides localized state management for all elements in the sub tree.

Pages

The concept of pages WRT the client refers to what is displayed at any particular URI. Pages comprise modules. These are defined in client/src/pages.

/home /

Static information mostly - partner logos are shown on this page.

/sites

The map is interactive in that it allows for assessing which variables are measured at which sites (or groups of sites) - this is achieved by clicking features on the atlas, that will both add selected sites to the metadata filter, and trigger charts (provided by eCharts) to display.

/networks /variables /protocols

These routes display the list/explorer pages. Mostly the pages make use of the reusable components that comprise the ExplorerPage module.

/networks/:id /variables/:id /protocols/:id

These routes display editor pages for the various entities, utilizing reusable components that form the EditorPage module.

/search-results

The search results page comprises a tabbed layout with the content of the tab a list of search results. This is easy to see via a visual representation of the element tree that is rendered on this page:

.
├── Toolbar header
└── Tab container
    ├── Tab content (org 1)
    │   └── Virtualized list (handles many records)
    │       ├── RecordViewer
    │       │   └── OrgRenderer
    │       ├── RecordViewer
    │       │   └── OrgRenderer
    │       └── (many more records) ...
    ├── Tab content (org 2)
    │   └── Virtualized list
    │       ├── RecordViewer
    │       │   └── OrgRenderer
    │       ├── RecordViewer
    │       │   └── OrgRenderer
    │       └── (many more records) ...
    └── ...

The OrgRenderer object is passed as properties to the RecordViewer component. A list of OrgRenderer objects is provided as configuration to the item renderer (<RecordViewer org={OrgRenderedObj} />). Loosely speaking this pattern is referred to as dependency injection.

Renderer objects comprise a variety of callbacks that are passed individual records. These callbacks need to be user-defined to return the correct information per field. The configuration file shows all the organizations that have been integrated into SEACRIFOG. Further work on SEACRIFOG could involve providing a means of editing the configuration object from a web UI - this would allow organizations to 'register' how their metadata records should be displayed. (Note that a similar registration process would need to be implemented on the API so that users could also define how any organization could be searched).

Deployment

For a simple setup (PostGIS, the API and the Client all served from a single server), deploy the Node.js API, PostGIS database, and React client directly from the root of this repository using the provided docker-compose.yml file. For an example of how to configure an automated deployment pipeline using GitHub Actions (with additional API and client configuration), refer to the GitHub Actions workflow configurations in this repository.

echo "POSTGRES_PASSWORD=PASSWORD" > ./.env
docker-compose up -d --force-recreate --build

Current deployment information (as of February 2020)

PostGIS: Served via a Docker container (mdillon/postgis Docker image)
API: Docker container (refer to the Dockerfile in the source code)
Browser client: Docker container (refer to the Dockerfile in the source code)
Server: Single CentOS 7 virtual machine (2 cores, 2GB RAM, 60GB)

DEVELOPER DOCUMENTATION

This repository contains two separate applications - a client and and API. Dependencies are NOT shared between these projects. Setup the project after cloning this repository via the following steps:

Install project wide dependencies

npm install

Install dependencies for the client and API (this is also mentioned below)

NOTE: these projects also need configuration - refer to the documentation below.

npm --prefix api/ install
npm --prefix client/ install

Start the API and client together

This is a helpful script that will start the API and client in the same terminal window. Alternatively you can start the API and client from the root of their respective directories.

npm start

API DEVELOPER DOCUMENTATION

API usage documentation is coming soon! Below are instructions on how how to contribute and deploy this software.

Quick start the API (local dev environment)

Start a PostGIS server

docker run -p 5432:5432 --name postgis -e POSTGRES_PASSWORD=password --restart always -d postgis/postgis:14-3.2

Create a database Create a database called seacrifog, and run the SQL query to create the schema manually in DBeaver/or some IDE (this should run automatically on API startup).

Setup the DB The .backup file is from an older version of PostgreSQL and some PostgreSQL clients don't read it as a result. DBeaver - a decent, free DB IDE - has a PostgreSQL client that works by default (but any PostgreSQL client should work).

Log into a running PostGIS server
Create a DB called seacrifog_old
Restore (seacrifog-prototype.backup) to this database. It's located in this repository at api/src/db/
Make sure that FORCE_DB_RESET is set to true (See below - This only runs automatically for local development. When deploying to a server, adjust the credentials of the dblink connection and run the script manually)

Once the seacrifog_old backup is restored, on application startup a new database will be initialized (seacrifog). The old data will be migrated to a new schema and the CSVs located in api/src/db/csvs will be imported as well. These are dummy data that are the result of work outputs prior to Work Package 5.4.

Work in the context of the API package

All the commands need to be run from the root of the API. Starting in the root of the seacrifog repository:

cd api

Install Node.js dependencies

npm install

Configure the API to re-create the database on startup

This is false by default (for obvious reasons!)

echo FORCE_DB_RESET=true > .env

Start the API

npm start

The application should be listening for connections on http://localhost:3000.

Deploying API to production

Configure a Postgis database server somewhere
The application reads a .env file located at api/.env on startup. So to configure the API, as part of the deployment process create such a file and populate it with production-sensible values (refer to notes below on "API configuration")
Start the app: npm --prefix api/ run start:prod

API configuration

This is a sample of the environment variables that the app requires to run - specifically in the context of a .env file (with the default values shown).

# Example .env file with defaults
PORT=3000
ALLOWED_ORIGINS=http://localhost:3000,http://localhost:3001
POSTGRES_HOST=localhost
POSTGRES_USER=postgres
POSTGRES_DATABASE=seacrifog
POSTGRES_PASSWORD=password
POSTGRES_PORT=5432
FORCE_DB_RESET=false
INITIAL_CRON_WAIT=1000
ICOS_INTEGRATION_SCHEDULE=*/10 * * * *

PORT The port on which the application listens for HTTP requests

ALLOWED_ORIGINS Clients (that support CORS restrictions) from these addresses will be allowed to access the API resources

POSTGRES_* PostgreSQL connection configuration parameters

FORCE_DB_RESET When true, the database will be deleted and recreated on API startup

INITIAL_CRON_WAIT It can take a number of seconds for the API to settle on startup (for example if the database is being created). The CRON scheduler will only start jobs after this delay

ICOS_INTEGRATION_SCHEDULE Intervals between runs of the ICOS integration logic (this is to get station information from the ICOS database)

CLIENT DEVELOPER DOCUMENTATION

Quick start the client (local dev environment)

Once the API is setup, configure the client. This needs to be don in the context of the client package, meaning all the commands need to be run from the root of the client. Starting in the root of the seacrifog repository. The following commands should be executed to setup the environment:

cd client
npm install
npm start

Create a file .env, with the following contents:

GQL_ENDPOINT=http://localhost:3000/graphql
DOWNLOADS_ENDPOINT=http://localhost:3000/downloads

Some helpful Notes

Testing this on Windows (using npm via Powershell), I had to install npm-run-all globally. npm install npm-run-all -g
Running npm install, some of the packages will install platform specific bindings. So if something isn't working try removing the node_modules directory and re-running npm install

Deploying Client to production

The application reads a .env located at client/.env during the Webpack build process. So to configure the client, as part of the deployment process (and prior to the build step) create such a file and populate it with production-sensible values (refer to notes below on "Client configuration")
Generate the build: npm run dist (from the root of the client package)
This will create a folder client/dist containing the client resources, with a typical index.html entry point. Serve via preferred HTTP server (Apache, Nginx, Node.js, etc.)

Some helpful Notes

The Dockerfile at client/Dockerfile encapsulates the above steps and should be usable in any deployment environment as is. Use the Dockerfile via the following commands:

# Change context to the client directory
cd client/

# Create an image
docker build -t seacrifog-client .

# Run a container, exposing relevant ports
docker run -p 80:80 seacrifog-client

Configuration

Configuration is looked for on Node's process.env environment configuration during build. Client configuration specifies, at build time, the address that the client looks for the API on.

# Example .env file with defaults
HTTP_ENDPOINT=https://api.seacrifog.saeon.ac.za/http
GQL_ENDPOINT=https://api.seacrifog.saeon.ac.za/graphql
DOWNLOADS_ENDPOINT=https://api.seacrifog.saeon.ac.za/downloads
DEFAULT_SELECTED_SITES=
DEFAULT_SELECTED_NETWORKS=
DEFAULT_SELECTED_VARIABLES=
DEFAULT_SELECTED_PROTOCOLS=

The DEFAULT_SELECTED_* configuration options are helpful for development, as it allows to test the application in various search states on app start. Specify a default selection via a comma separated list of IDs - DEFAULT_SELECTED_SITES=1,2,3,4,etc. IDs are obviously relative to your local database, and this setting should not be pushed to the master branch (and therefore production).

Name		Name	Last commit message	Last commit date
Latest commit History 1,131 Commits
.github		.github
.vscode		.vscode
api		api
client		client
deploy/next		deploy/next
images		images
platform-configuration/nginx/stable/server-blocks		platform-configuration/nginx/stable/server-blocks
scripts		scripts
.README.md		.README.md
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

License

SAEON/seacrifog

Folders and files

Latest commit

History

Repository files navigation