forked from tenforce/docker-virtuoso
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mu scripts to verify data sync #9
Open
Jan-PieterBaert
wants to merge
4
commits into
master
Choose a base branch
from
datadump
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
30220ec
Add script to dump all quads in the database
Jan-PieterBaert cd9b14d
Add script to diff quad files
Jan-PieterBaert 8039cd1
Add config example to data-diff script
Jan-PieterBaert 28756c7
Make dump script output to a single quad file
Jan-PieterBaert File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
{ "source": "/project/data/file1.nq", | ||
"target": "/project/data/file2.nq", | ||
"graphs": [ | ||
"http://mu.semte.ch/application" | ||
], | ||
"graphRegexes": ["http://mu.semte.ch/vocabularies/ext/tabId"] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
#!/bin/bash | ||
apt-get update > /dev/null | ||
apt-get -y install jq python3 > /dev/null | ||
|
||
config=$1 | ||
source=$(jq -r ".source" $config) | ||
target=$(jq -r ".target" $config) | ||
|
||
command=$(python3 generate-datadiff.py $config) | ||
|
||
diff <(cat $source | eval $command | sort) <(cat $target | eval $command | sort) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
#!/bin/bash | ||
USERNAME=${2:-"dba"} | ||
PASSWORD=${3:-"dba"} | ||
TRIPLESTORE=${1:-"triplestore"} | ||
|
||
if [[ "$#" -ge 3 ]]; then | ||
echo "Usage:" | ||
echo " mu script triplestore [hostname] [username] [password]" | ||
exit -1; | ||
fi | ||
|
||
if [[ -d "/project/data/db" ]];then | ||
mkdir -p /project/data/db/dumps | ||
else | ||
echo "WARNING:" | ||
echo " did not find data/db folder in your project, so did not create data/db/dumps!" | ||
echo " " | ||
fi | ||
|
||
|
||
echo "connecting to $TRIPLESTORE with $USERNAME" | ||
isql-v -H $TRIPLESTORE -U $USERNAME -P $PASSWORD <<EOF | ||
CREATE PROCEDURE dump_nquads | ||
( IN dir VARCHAR := 'dumps' | ||
, IN start_from INT := 1 | ||
, IN file_length_limit INTEGER := 100000000 | ||
, IN comp INT := 1 | ||
) | ||
{ | ||
DECLARE inx, ses_len INT | ||
; DECLARE file_name VARCHAR | ||
; DECLARE env, ses ANY | ||
; | ||
|
||
inx := start_from; | ||
SET isolation = 'uncommitted'; | ||
env := vector (0,0,0); | ||
ses := string_output (10000000); | ||
FOR (SELECT * FROM (sparql define input:storage "" SELECT ?s ?p ?o ?g { GRAPH ?g { ?s ?p ?o } . FILTER ( ?g != virtrdf: ) } ) AS sub OPTION (loop)) DO | ||
{ | ||
DECLARE EXIT HANDLER FOR SQLSTATE '22023' | ||
{ | ||
GOTO next; | ||
}; | ||
http_nquad (env, "s", "p", "o", "g", ses); | ||
ses_len := LENGTH (ses); | ||
IF (ses_len >= file_length_limit) | ||
{ | ||
file_name := sprintf ('%s/output%06d.nq', dir, inx); | ||
string_to_file (file_name, ses, -2); | ||
IF (comp) | ||
{ | ||
gz_compress_file (file_name, file_name||'.gz'); | ||
file_delete (file_name); | ||
} | ||
inx := inx + 1; | ||
env := vector (0,0,0); | ||
ses := string_output (10000000); | ||
} | ||
next:; | ||
} | ||
IF (length (ses)) | ||
{ | ||
file_name := sprintf ('%s/output%06d.nq', dir, inx); | ||
string_to_file (file_name, ses, -2); | ||
IF (comp) | ||
{ | ||
gz_compress_file (file_name, file_name||'.gz'); | ||
file_delete (file_name); | ||
} | ||
inx := inx + 1; | ||
env := vector (0,0,0); | ||
} | ||
} | ||
; | ||
dump_nquads ('dumps', 1, 100000000, 1); | ||
exit; | ||
EOF | ||
gunzip /project/data/db/dumps/*.gz | ||
cat /project/data/db/dumps/* > /project/data/dumped-quads.nq |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
import sys | ||
import json | ||
|
||
config_file = sys.argv[1] | ||
config = {} | ||
with open(config_file) as f: | ||
config = json.load(f) | ||
|
||
graph_regex = "<{}> .$".format("|".join(config['graphs'])) | ||
grep_commands = f'egrep "{graph_regex}"' | ||
for regex in config['graphRegexes']: | ||
grep_commands += f'| egrep "{regex}"' | ||
|
||
print(grep_commands) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
our db normally already has this procedure, so no need to add it again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've been testing this on Kaleidos and indeed just calling
dump_nquads
without re-defining the procedure works.