This is part of the DataCite Usage Analytics service.
This has a public API that is the main endpoint for the DataCite Tracker Events are stored within a Clickhouse database and then statistics according to COUNTER can be calculated.
- Install the tracking script DataCite Tracker
- Configure using appropriate details
- Results should be sent to the /api/metric end point
- You can use the check api endpoint /api/check/{repo_id} to see if results are being recorded, it returns 200 and the timestamp of the last event if successful.
Based on the data stored in Clickhouse and statistics that can be generated, usage reports in the format of SUSHI Json can be generated. This can then be sent through to the DataCite Reports API for storage and processing into DataCite Event Data
Requirements:
- Go 1.19
Configuration is taken from the environment
- ANALYTICS_DATABASE_HOST - Clickhouse database URL
- ANALYTICS_DATABASE_USER - Clickhouse user
- ANALYTICS_DATABASE_PASSWORD - Clickhouse password
- ANALYTICS_DATABASE_DBNAME - Clickhouse database name
- VALIDATE_DOI_EXISTENCE - Can enable/disable DOI existence validation for event tracking - default to true.
- VALIDATE_DOI_URL - Can enable/disable DOI URL validation for event tracking - default to false.
- DATACITE_API_URL - This is used only when storing events as part of DOI validation
- JWT_PUBLIC_KEY - This is used on authenticated endpoints to validate valid DataCite JWTs
# Start the http server
go run cmd/web/main.go
# Build the Docker image
$ docker build -f ./docker/web/Dockerfile -t keeshondweb .
# and you can run the image with the following command
$ docker run -p 8081:8081 --rm -ti keeshondweb
This is triggered via a worker script, note that this will automatically submit the usage report to the Usage Reports API.
The variables needed for the report generation are taken from Environment variables
- REPO_ID - The unique tracking id for a repository, this is used for which stats to collect. This is assigned by DataCite.
- BEGIN_DATE - The reporting period start date, typically this will be the start of a month.
- END_DATE - The reporting perioid end date, typically this will be the end of a month.
- PLATFORM - The name or identifier of the platform that the usage is from.
- PUBLISHER - The name of publisher of the dataset
- PUBLISHER_ID - The identifier of publisher of the dataset
In addition a valid DataCite JWT will need to be supplied for authentication and submission to the Usage Reports API.
- DATACITE_JWT - Valid JWT with correct permissions. This is assigned by DataCite.
A report can be triggered using the worker version of the application.
e.g. Note: Assumes general config has been setup i.e. clickhouse database connection
REPO_ID=datacite.demo BEGIN_DATE=2022-01-01 END_DATE=2022-12-31 PLATFORM=datacite PUBLISHER="datacite demo" PUBLISHER_ID=datacite.demo go run cmd/worker/main.go
Note: Assumes general config has been setup i.e. clickhouse database connection
# Build worker image
docker build -f ./docker/worker/Dockerfile -t keeshondworker .
# Run docker with env vars
docker run --network="host" --env REPO_ID=datacite.demo --env BEGIN_DATE=2022-01-01 --env END_DATE=2022-12-31 --env PLATFORM=datacite --env PUBLISHER="datacite demo" --env PUBLISHER_ID=datacite.demo keeshondworker
# Connect to the local docker Clickhouse database container
clickhouse client --user=keeshond --password=keeshond