Skip to content

Commit

Permalink
Create data dumps and statistics (#29 and #25)
Browse files Browse the repository at this point in the history
  • Loading branch information
nichtich committed May 7, 2021
1 parent 6f63368 commit a9e65de
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 0 deletions.
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,6 @@ config/config.user.js
.env
prisma/migrations/*
!prisma/migrations/migration_lock.toml

public/dump.ndjson
public/stats.json
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ This repository contains an implementation of an API to analyze synthesized DDC
- [Install](#install)
- [Usage](#usage)
- [Preparing the database](#preparing-the-database)
- [Data dumps and statistics](#data-dumps-and-statistics)
- [Database migrations](#database-migrations)
- [Development](#development)
- [Production](#production)
Expand Down Expand Up @@ -65,6 +66,10 @@ node ./bin/convert --import ~/path/to/ou_liu_t_de-slim-21-02-15-1121

Add `--reset` to delete old records from the database.

### Data dumps and statistics

The script `./bin/stats.sh` creates a database dump (unless the file already exists, so it is not updated) and calculates some statistics into `public/stats.json`.

### Database migrations

Sometimes, a database migration might be necessary. It will be mentioned in the release notes. In that case, please run the following command to migrate your database:
Expand Down
32 changes: 32 additions & 0 deletions bin/stats.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#!/bin/bash

DUMP=public/dump.ndjson

# Zur Aktualisierung bitte Dump löschen!
[[ -f $DUMP ]] || \
psql coli-ana --no-align -q -c '\t' -c 'SELECT "memberList" FROM data' | jq -c '{memberList:.}' > $DUMP

# Anzahl analysierter DDC-Notationen
TOTAL=$(wc -l < $DUMP)

# Davon unvollständig analysiert
INCOMPLETE=$(jq -r '.memberList[-1].notation[1]|select(endswith("-"))' $DUMP | wc -l)

# Anzahl unterschiedlicher DDC-Klassen die irgendwo in der Analyse vorkommen
ELEMENTS=$(jq -r '.memberList[].notation[0]' $DUMP | sort | uniq | wc -l)

# Histogram der Anzahl von Elementen pro Analyse
ELEMENTS_COUNT=$(
jq -r '.memberList|length' $DUMP | sort | uniq -c | sort -nk2 | sed 's/^ *//' \
| jq -R 'split(" ")|{key:.[1],value:(.[0]|tonumber)}' | jq -s 'from_entries'
)

cat <<-JSON
{
"numbers": $TOTAL,
"incomplete": $INCOMPLETE,
"elements": $ELEMENTS,
"elementsCount": $ELEMENTS_COUNT
}
JSON

0 comments on commit a9e65de

Please sign in to comment.