Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[absolete]Connectors' markers implementation -- 010, 060, 069 #157

Closed
wants to merge 12 commits into from
36 changes: 28 additions & 8 deletions Utils/Dataflow/060_upload2virtuoso/uploadTTL.sh
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ PORT='8890'
GRAPH=
GRAPH_PATH='DAV/ATLAS'
MODE="f"
DELIMITER="NOT SPECIFIED"
BATCHMODE="d"
EOBatch='\x17'

# File .credentials may contain variable definition for USER and PASSWD
if [ -f ".credentials" ]; then
Expand Down Expand Up @@ -132,8 +133,7 @@ upload_files () {
upload_stream () {
EOProcess="\0"

local delimiter=$'\n'

local delimiter=''
[ -z "$TYPE" ] && { echo "(ERROR) input data format is not specified. Exiting." >&2; return 2;}
while [[ $# > 0 ]]
do
Expand All @@ -153,6 +153,8 @@ upload_stream () {
shift
done

[ -z "$delimiter" ] && { echo "(ERROR) Delimiter is not specified. Exiting." >&2; return 2;}

case $TYPE in
t|ttl)
cmd="$cmdTTL --data-urlencode res-file@-"
Expand Down Expand Up @@ -209,8 +211,12 @@ do
MODE="${2,,}"
shift
;;
-d|--delimiter)
DELIMITER=`echo -e $2`
-b|--batch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bracket is missed:
-b|--batch)

BATCHMODE="${2,,}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "${2,,}", not simply "$2"?

shift
;;
-B|--eob)
EOBatch=`echo -ne $2`
Copy link
Collaborator

@mgolosova mgolosova Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we have quite a long discussion about this mistake I`ve done when wrote this script :)
What happens when echo -ne $2 is used?

  1. If $2 was "\0" or "\n", the value of EOBatch will be the empty string, as shell trims all empty/space symbols from command substitution result. See in the example with \n:
[mgolosova@bamboo dkb]$ echo -e "abcde\n\n\n"                                                                                                                                                                
abcde



[mgolosova@bamboo dkb]$ echo `echo -ne "abcde\n\n\n"`
abcde
[mgolosova@bamboo dkb]$ 
  1. Even if $2 was not a code of "empty/space" symbol: here you get the interpreted value, while the default value is not interpreted: EOBatch='\x17'. What do you pass to upload_stream? Interpreted value or not? If not -- what will be used as a delimiter in read -r -d "$delimiter"?
    See (mind 0a for newline and 0a 17 for newline+EOB in hexdump):
[mgolosova@bamboo dkb]$ EOBatch='\x17'                                                                                                                                                                             
[mgolosova@bamboo dkb]$ echo -ne "message_1\nmessage_2\n${EOBatch}message_3\nmessage_4\n${EOBatch}" \
>   | hexdump -C
00000000  6d 65 73 73 61 67 65 5f  31 0a 6d 65 73 73 61 67  |message_1.messag|
00000010  65 5f 32 0a 17 6d 65 73  73 61 67 65 5f 33 0a 6d  |e_2..message_3.m|
00000020  65 73 73 61 67 65 5f 34  0a 17                    |essage_4..|
0000002a
[mgolosova@bamboo dkb]$ echo -ne "message_1\nmessage_2\n${EOBatch}message_3\nmessage_4\n${EOBatch}" \
>   | while read -r -d "$EOBatch" line; do echo $line; done
[mgolosova@bamboo dkb]$ echo -n "message_1\nmessage_2\n${EOBatch}message_3\nmessage_4\n${EOBatch}" \
>   | while read -r -d "$EOBatch" line; do echo $line; done                                                         
message_1
nmessage_2
n
x17message_3
nmessage_4
n
[mgolosova@bamboo dkb]$ echo -ne "message_1\nmessage_2\n${EOBatch}message_3\nmessage_4\n${EOBatch}" \
>   | while read -r -d "`echo -ne $EOBatch`" line; do echo $line; done                                             
message_1 message_2
message_3 message_4
[mgolosova@bamboo dkb]$ 

Resume:

  • variable should always contain interpreted or not interpreted value, whenever it was (re)assigned;
  • to read should be passed interpreted value;
  • we have to do something with "empty/space" symbols: read uses \0 as delimiter if empty string is passed, but I have no idea right now how to pass it \n :)

shift
;;
-E|--eop)
Expand Down Expand Up @@ -244,10 +250,24 @@ done
[ -z "$HOST" ] && echo "(ERROR) empty host value." >&2 && exit 2
[ -z "$PORT" ] && echo "(ERROR) empty port value." >&2 && exit 2
[ -z "$GRAPH" ] && GRAPH=http://$HOST:$PORT/$GRAPH_PATH
[ -z "$DELIMITER" ] && DELIMITER=$'\0'
[ "x$DELIMITER" = "xNOT SPECIFIED" ] && DELIMITER=$'\n'

[ -n "$EOB" ] && EOBatch="$EOB"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the EOB variable is used for? It is not assigned anywhere in the code...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgolosova

I found an implicit mistake in my code because of that question. Thank you!

The thing is that I named all variables which keeps parameters from a comand line like EOM/EOP/EOB (my mistake is that I named it as "EOBatch"). In 010 connector I saw that it would be better to pass markers from another variables EOMessage/EOProcess, because it was a little bit inconvenient for me (according to the code's logic) to use variables EOM/EOP.

I hope, after renaming it'll become much more clear. I'll create an extra commit with explanation of my decision.

[ -n "$EOP" ] && EOProcess="$EOP"

case $BATCHMODE in
e|enabled)
[ -z "$EOB" ] && EOBatch="\n"
;;
d|disabled)
( [ -z "$EOB" ] || [ $EOB == "\x17" ] || [ $EOBatch == "\x17" ] ) && EOBatch="\n"
( [ -n "$EOB" ] && [ ! $EOB == "\x17" ] ) && EOBatch="$EOB"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I`m getting this case statement properly... It is similar to the one puzzled me in 069_upload2es/load_data.sh, but here we have slightly different initial conditions...

First, what does it say now (suggesting, that EOB should be used as EOP, for custom value passed with --eob):
"If batch mode enabled, EOBatch is \n unless EOB is defined. If batch mode is disabled, EOBatch is \n unless EOB is defined with anything but \x17 -- else EOBatch is EOB."

Concerning conditions... as we have this lines above:

...
    -B|--eob)
      EOB=`echo -ne $2` # allowed myself to replace EOBatch with EOB
...
[ -n "$EOB" ] && EOBatch="$EOB"

within the case:

  • if EOB is empty, EOBatch should be "\x17" (the default value);
  • if EOB is "\x17" (or, better to say, if user passed --eob \x17 in the command line), then EOB is not string "\x17", but ASCII code ETB;
  • if, after all, EOB is string "\x17", then EOBatch is "\x17" too.

So, in plain words -- what should be done during this case?

Copy link
Contributor Author

@anastasiakaida anastasiakaida Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgolosova

I supposed that case allows here to check the following conditions:

if batch mode is disabled (by hands or by default):

  1. no EOB = --eob '' --> $EOB='\n'
  2. we have EOB, but \x17 (hex) --> $EOB='\n' (it might be unnecessary and too strict, but I'm not sure)
  3. we have any other EOB --> $EOB='EOB'

if batch mode is enabled:

  1. only if we type --eob '' --> $EOB='\n'

UPD: it seems I forgot about quotes at all in case, shame on me. :(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. I think there`s a little bit of mess happened with enabled/disabled batch mode etc...


First, about this (just to make sure we do understand things same way).
@anastasiakaida wrote:

we have EOB, but \x17 (hex) --> $EOB='\n' (it might be unnecessary and too strict, but I'm not sure)

This checks:

[ ! $EOB == "\x17" ]
[ $EOB == "\x17" ]

are string comparison. Works this way:

mgolosova@artois-K:~$ a_str="\x17"
mgolosova@artois-K:~$ a_ascii=$'\x17'
mgolosova@artois-K:~$ echo "str: '$a_str'; ASCII: '$a_ascii'"
str: '\x17'; ASCII: '�'
mgolosova@artois-K:~$ [ "$a_str" == "\x17" ] && echo "True" || echo "False"
True
mgolosova@artois-K:~$ [ "$a_ascii" == "\x17" ] && echo "True" || echo "False"
False

Now EOB is assigned with string value, so it`s OK; just wanted to make things clear.


Next, about the general logic.
I have tested it with attached script: test_batchmode.txt

Here`s the results:

   -b   ||  X   |  X   |   X    |   X    || 'e'  | 'e'  |  'e'   |  'e'   |
------- || ---- | ---- | ------ | ------ || ---- | ---- | ------ | ------ | 
   -B   ||  X   |  ''  | '\x17' | '\x11' ||  X   |  ''  | '\x17' | '\x11' |
======= || ==== | ==== | ====== | ====== || ==== | ==== | ====== | ====== |
EOBatch || '\n' | '\n' |  '\n'  | '\x11' || '\n' | '\n' | '\x17' | '\x11' |

How do I read it: batch mode is actually enabled in three cases:

  • -B 'non-default marker';
  • -b -B 'default marker';
  • -b -B 'non-default marker'.

While I would expect the following table:

   -b   ||  X   |  X   |   X    |   X    ||  'e'   | 'e'  |  'e'   |  'e'   |
------- || ---- | ---- | ------ | ------ || ------ | ---- | ------ | ------ | 
   -B   ||  X   |  ''  | '\x17' | '\x11' ||   X    |  ''  | '\x17' | '\x11' |
======= || ==== | ==== | ====== | ====== || ====== | ==== | ====== | ====== |
EOBatch || '\n' | '\n' | '\x17' | '\x11' || '\x17' | '\n' | '\x17' | '\x11' |

Two cases added:

  • -b;
  • -B 'default marker'.

In other words:

  • it shouldn`t matter if user passed default marker or not: if it is passed implicitly, it is preferable value;
  • if user said "enable batch mode", then we should use default EOB for batch mode.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*) And about forgotten quotes: it was totally OK; shell takes non-quoted stuff as a string unless it is a keyword or the string is followed by = (meaning that it is variable assignment). Quotes are required for multi-word strings (though they could be written without quotes too, if spaces are escaped) and to escape special characters.
See:

mgolosova@artois-K:~$ a=abcdef
mgolosova@artois-K:~$ echo $a
abcdef
mgolosova@artois-K:~$ echo abcdef
abcdef
mgolosova@artois-K:~$ a=abc def
def: command not found
mgolosova@artois-K:~$ a=abc\ def
mgolosova@artois-K:~$ echo $a
abc def
mgolosova@artois-K:~$ echo abc\def
abcdef
mgolosova@artois-K:~$ echo abc\ def
abc def
mgolosova@artois-K:~$ echo "abc\ def"
abc\ def
mgolosova@artois-K:~$ echo "abc def"
abc def

So in case it is OK to write strings without quotes (as values, not keywords or something are expected); moreover, if a wildcard '*' is used, quotes will escape it and it will be taken as an asterisk character, not a wildcard:

mgolosova@artois-K:~$ a=abcd
mgolosova@artois-K:~$ case $a in
> "a*")
>   echo 'Quoted case: "a*"';;
> a*)
>   echo 'Non quoted case: a*';;
> *)
>   ;;
> esac
Non quoted case: a*

Yet in this case quotes are also OK to use, as no wildcard or something is used.

;;
*)
echo "(ERROR) Unexpected batch-mode parameter." >&2 && { [ -n "$BATCHMODE" ] && usage && return 2; }
break
;;
esac

cmdTTL="curl --retry 3 -s -f -X POST --digest -u $USER:$PASSWD -H Content-Type:text/turtle -G http://$HOST:$PORT/sparql-graph-crud-auth --data-urlencode graph=$GRAPH"
cmdSPARQL="curl --retry 3 -s -f -H 'Accept: text/csv' -G http://$HOST:$PORT/sparql --data-urlencode query"

Expand All @@ -256,7 +276,7 @@ case $MODE in
upload_files $*;
;;
s)
upload_stream -d "$DELIMITER";
upload_stream -d "$EOBatch";
;;
*)
echo "(ERROR) $MODE: unsupported mode." >&2
Expand Down