-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[absolete]Connectors' markers implementation -- 010, 060, 069 #157
Conversation
EOProcess: -E|--eop EOMessage: -e|--eom *No default EOM/EOP values specified *EOM/EOP variables ($EOM, $EOP) are not applied now in this .sh script
If -p or -f specified -- it's "file mode". Otherwise -- "stream mode". $EOM and $EOP variables are internal and keep data from command line. $EOMarkers and $EOProcess variables are for output.
Default: (f)ile mode -- none (s)tream mode -- '\0'
For batch-mode processing that option should be enabled: -b|--batch (e)enabled|(d)isabled Default: disabled If EOB (-B|--eob) is specifiend with a custom marker (not "\x17" -- default), batch-mode will be enabled anyway. In other cases batch-mode should be enabled. Note: Empty EOB = disabled batch mode.
I've met some collisiong in #130, so I decided to close the previous one and open a new one. Please, notice me if I forgot to fix something. :) PS: I still don't know that to do with the last line fix in 19175cd. I would like to use 'strict realization', but I'm not sure what to do firstly. Should I leave it as it is now or make some improvements? |
@@ -119,7 +120,8 @@ upload_files () { | |||
;; | |||
esac | |||
|
|||
eval "$cmd" || { echo "(ERROR) An error occured while uploading file: $INPUTFILE" >&2; continue; } | |||
eval "$cmd" || { echo "(ERROR) An error occured while uploading file: $INPUTFILE" >&2; } & | |||
echo -ne "$EOProcess" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened here?
"custom EOP implementation" makes me think that the answer should be "Custom EOP marker output added". But the diff says:
continue
directive removed;eval "$cmd" || echo
is sent to background;- custom EOP output added.
What are the first two changes are made for?
Update: by the way, I see I made a comment to this change in #130; now I see addressed only the part concerning continue
(and it was done quite silently, as if anyone reading this commit must know about our conversation).
If you think the continue
directive should be deleted -- explain it in commit log; if you just follow my faint comment "I`m not sure if it is needed here..." -- then do not. Ask why it is not needed. Then think, does this change belong to this commit or not. Please, do not try to guess what do I expect you to do... if you know -- do it and show that you understand what and why was done. If not -- ask, and ask again if needed, and do it till you understand the idea and can explain even better than me.
What about everything else I wrote there?
-d|--delimiter) | ||
DELIMITER=`echo -e $2` | ||
-b|--batch | ||
BATCHMODE="${2,,}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why "${2,,}"
, not simply "$2"
?
[ -z "$DELIMITER" ] && DELIMITER=$'\0' | ||
[ "x$DELIMITER" = "xNOT SPECIFIED" ] && DELIMITER=$'\n' | ||
|
||
[ -n "$EOB" ] && EOBatch="$EOB" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What the EOB
variable is used for? It is not assigned anywhere in the code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found an implicit mistake in my code because of that question. Thank you!
The thing is that I named all variables which keeps parameters from a comand line like EOM/EOP/EOB (my mistake is that I named it as "EOBatch"). In 010 connector I saw that it would be better to pass markers from another variables EOMessage/EOProcess, because it was a little bit inconvenient for me (according to the code's logic) to use variables EOM/EOP.
I hope, after renaming it'll become much more clear. I'll create an extra commit with explanation of my decision.
shift | ||
;; | ||
-B|--eob) | ||
EOBatch=`echo -ne $2` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we have quite a long discussion about this mistake I`ve done when wrote this script :)
What happens when echo -ne $2
is used?
- If
$2
was"\0"
or"\n"
, the value ofEOBatch
will be the empty string, as shell trims all empty/space symbols from command substitution result. See in the example with\n
:
[mgolosova@bamboo dkb]$ echo -e "abcde\n\n\n"
abcde
[mgolosova@bamboo dkb]$ echo `echo -ne "abcde\n\n\n"`
abcde
[mgolosova@bamboo dkb]$
- Even if
$2
was not a code of "empty/space" symbol: here you get the interpreted value, while the default value is not interpreted:EOBatch='\x17'
. What do you pass toupload_stream
? Interpreted value or not? If not -- what will be used as a delimiter inread -r -d "$delimiter"
?
See (mind0a
for newline and0a 17
for newline+EOB in hexdump):
[mgolosova@bamboo dkb]$ EOBatch='\x17'
[mgolosova@bamboo dkb]$ echo -ne "message_1\nmessage_2\n${EOBatch}message_3\nmessage_4\n${EOBatch}" \
> | hexdump -C
00000000 6d 65 73 73 61 67 65 5f 31 0a 6d 65 73 73 61 67 |message_1.messag|
00000010 65 5f 32 0a 17 6d 65 73 73 61 67 65 5f 33 0a 6d |e_2..message_3.m|
00000020 65 73 73 61 67 65 5f 34 0a 17 |essage_4..|
0000002a
[mgolosova@bamboo dkb]$ echo -ne "message_1\nmessage_2\n${EOBatch}message_3\nmessage_4\n${EOBatch}" \
> | while read -r -d "$EOBatch" line; do echo $line; done
[mgolosova@bamboo dkb]$ echo -n "message_1\nmessage_2\n${EOBatch}message_3\nmessage_4\n${EOBatch}" \
> | while read -r -d "$EOBatch" line; do echo $line; done
message_1
nmessage_2
n
x17message_3
nmessage_4
n
[mgolosova@bamboo dkb]$ echo -ne "message_1\nmessage_2\n${EOBatch}message_3\nmessage_4\n${EOBatch}" \
> | while read -r -d "`echo -ne $EOBatch`" line; do echo $line; done
message_1 message_2
message_3 message_4
[mgolosova@bamboo dkb]$
Resume:
- variable should always contain interpreted or not interpreted value, whenever it was (re)assigned;
- to
read
should be passed interpreted value; - we have to do something with "empty/space" symbols:
read
uses\0
as delimiter if empty string is passed, but I have no idea right now how to pass it\n
:)
log ERROR "Unexpected batch-mode parameter." && { [ -n "$BATCHMODE" ] && usage && return 2; } | ||
break | ||
;; | ||
esac |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I`m getting this case
statement.
What makes me puzzled:
BATCHMODE
is just assigned with"d"
(rewriting user value); why usecase $BATCHMODE
, when we already know the value for sure?EOBatch
is just assigned with default value'\x17'
; why compare it with"\x17"
again?- after this
case
,EOBatch
will stay assigned with'\x17'
only when batchmode isenabled
andEOB
is not empty; - after this
case
,EOBatch
will be assigned with value ofEOB
only when batchmode isdisabled
andEOB
is not empty and not"\x17"
.
What did you mean to do?
;; | ||
d|disabled) | ||
( [ -z "$EOB" ] || [ $EOB == "\x17" ] || [ $EOBatch == "\x17" ] ) && EOBatch="\n" | ||
( [ -n "$EOB" ] && [ ! $EOB == "\x17" ] ) && EOBatch="$EOB" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I`m getting this case
statement properly... It is similar to the one puzzled me in 069_upload2es/load_data.sh
, but here we have slightly different initial conditions...
First, what does it say now (suggesting, that EOB
should be used as EOP
, for custom value passed with --eob
):
"If batch mode enabled, EOBatch is \n
unless EOB is defined. If batch mode is disabled, EOBatch is \n
unless EOB is defined with anything but \x17
-- else EOBatch is EOB."
Concerning conditions... as we have this lines above:
...
-B|--eob)
EOB=`echo -ne $2` # allowed myself to replace EOBatch with EOB
...
[ -n "$EOB" ] && EOBatch="$EOB"
within the case
:
- if
EOB
is empty,EOBatch
should be"\x17"
(the default value); - if
EOB
is"\x17"
(or, better to say, if user passed--eob \x17
in the command line), thenEOB
is not string"\x17"
, but ASCII code ETB; - if, after all,
EOB
is string"\x17"
, thenEOBatch
is"\x17"
too.
So, in plain words -- what should be done during this case
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I supposed that case
allows here to check the following conditions:
if batch mode is disabled (by hands or by default):
- no EOB =
--eob ''
-->$EOB='\n'
- we have EOB, but \x17 (hex) -->
$EOB='\n'
(it might be unnecessary and too strict, but I'm not sure) - we have any other EOB -->
$EOB='EOB'
if batch mode is enabled:
- only if we type
--eob ''
-->$EOB='\n'
UPD: it seems I forgot about quotes at all in case
, shame on me. :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. I think there`s a little bit of mess happened with enabled/disabled batch mode etc...
First, about this (just to make sure we do understand things same way).
@anastasiakaida wrote:
we have EOB, but \x17 (hex) --> $EOB='\n' (it might be unnecessary and too strict, but I'm not sure)
This checks:
[ ! $EOB == "\x17" ]
[ $EOB == "\x17" ]
are string comparison. Works this way:
mgolosova@artois-K:~$ a_str="\x17"
mgolosova@artois-K:~$ a_ascii=$'\x17'
mgolosova@artois-K:~$ echo "str: '$a_str'; ASCII: '$a_ascii'"
str: '\x17'; ASCII: '�'
mgolosova@artois-K:~$ [ "$a_str" == "\x17" ] && echo "True" || echo "False"
True
mgolosova@artois-K:~$ [ "$a_ascii" == "\x17" ] && echo "True" || echo "False"
False
Now EOB
is assigned with string value, so it`s OK; just wanted to make things clear.
Next, about the general logic.
I have tested it with attached script: test_batchmode.txt
Here`s the results:
-b || X | X | X | X || 'e' | 'e' | 'e' | 'e' |
------- || ---- | ---- | ------ | ------ || ---- | ---- | ------ | ------ |
-B || X | '' | '\x17' | '\x11' || X | '' | '\x17' | '\x11' |
======= || ==== | ==== | ====== | ====== || ==== | ==== | ====== | ====== |
EOBatch || '\n' | '\n' | '\n' | '\x11' || '\n' | '\n' | '\x17' | '\x11' |
How do I read it: batch mode is actually enabled in three cases:
-B 'non-default marker'
;-b -B 'default marker'
;-b -B 'non-default marker'
.
While I would expect the following table:
-b || X | X | X | X || 'e' | 'e' | 'e' | 'e' |
------- || ---- | ---- | ------ | ------ || ------ | ---- | ------ | ------ |
-B || X | '' | '\x17' | '\x11' || X | '' | '\x17' | '\x11' |
======= || ==== | ==== | ====== | ====== || ====== | ==== | ====== | ====== |
EOBatch || '\n' | '\n' | '\x17' | '\x11' || '\x17' | '\n' | '\x17' | '\x11' |
Two cases added:
-b
;-B 'default marker'
.
In other words:
- it shouldn`t matter if user passed default marker or not: if it is passed implicitly, it is preferable value;
- if user said "enable batch mode", then we should use default EOB for batch mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*) And about forgotten quotes: it was totally OK; shell takes non-quoted stuff as a string unless it is a keyword or the string is followed by =
(meaning that it is variable assignment). Quotes are required for multi-word strings (though they could be written without quotes too, if spaces are escaped) and to escape special characters.
See:
mgolosova@artois-K:~$ a=abcdef
mgolosova@artois-K:~$ echo $a
abcdef
mgolosova@artois-K:~$ echo abcdef
abcdef
mgolosova@artois-K:~$ a=abc def
def: command not found
mgolosova@artois-K:~$ a=abc\ def
mgolosova@artois-K:~$ echo $a
abc def
mgolosova@artois-K:~$ echo abc\def
abcdef
mgolosova@artois-K:~$ echo abc\ def
abc def
mgolosova@artois-K:~$ echo "abc\ def"
abc\ def
mgolosova@artois-K:~$ echo "abc def"
abc def
So in case
it is OK to write strings without quotes (as values, not keywords or something are expected); moreover, if a wildcard '*' is used, quotes will escape it and it will be taken as an asterisk character, not a wildcard:
mgolosova@artois-K:~$ a=abcd
mgolosova@artois-K:~$ case $a in
> "a*")
> echo 'Quoted case: "a*"';;
> a*)
> echo 'Non quoted case: a*';;
> *)
> ;;
> esac
Non quoted case: a*
Yet in this case quotes are also OK to use, as no wildcard or something is used.
EOB="$2" | ||
shift;; | ||
-b|--batch) | ||
BATCHMODE="${2,,}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, why "${2,,}"
, not simply "$2"
?
There seem to be no difference:
[mgolosova@bamboo dkb]$ f () {
> echo "'${1,,}'"
> echo "'${1}'"
> }
[mgolosova@bamboo dkb]$ f enabled
'enabled'
'enabled'
[mgolosova@bamboo dkb]$ f e
'e'
'e'
[mgolosova@bamboo dkb]$ f
''
''
|
||
cmd="curl $ES_AUTH http://$ES_HOST:$ES_PORT/_bulk?pretty --data-binary @" | ||
|
||
load_files () { | ||
[ -z "$1" -o ! -f "$1" ] && log $(usage) && exit 1 | ||
[ -z "$1" -o ! -f "$1" ] && log NOTSET $(usage) && exit 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why NOTSET
? I`d say INFO
or ERROR
or, if the idea is to output usage
without log level -- use usage >&2
, plain usage
, or maybe even teach log
to work with skipped level parameter...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't decide which level (ERROR
or INFO
) that case belongs to. I decided to assign to it level 0 (NOTSET). I took into account PR #135, so why not to mention NOTSET directly and allow to handle log with skipped parameter as level 0?
data in the stream mode. | ||
Default: \n | ||
Kafka-style: \0 | ||
Kafka-style: \x17 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that as "batch mode" is formalized and properly defined, there`s no such thing as "Kafka style" any longer. Kafka style was informal description of "batch mode", as Kafka-supervised processing was the only case when batch mode was used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I suppose, it would be better to represent it as:
Dafault: \x17
No batch-mode: \n
@@ -163,7 +163,7 @@ upload_stream () { | |||
esac | |||
|
|||
while true; do | |||
while read -r -d "$delimiter" line; do | |||
while read -r -d "$delimiter" line || [[ -n "$line" ]]; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concerning your question: I think, if we want to work in "strict" mode (no EOM -- no message, no EOB -- no batch), then this change is not required.
Yet thank you for noticing: I have missed this issue when wrote the code.
By the way: if this commit was to be kept, there must be issue description in the commit log, right? ;)
@@ -69,7 +69,8 @@ OPTIONS: | |||
} | |||
|
|||
upload_files () { | |||
|
|||
EOProcess="" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EOProcess
is explicitly defined in both upload_files
and upload_stream
. How user value is supposed to be used?
@@ -209,8 +211,12 @@ do | |||
MODE="${2,,}" | |||
shift | |||
;; | |||
-d|--delimiter) | |||
DELIMITER=`echo -e $2` | |||
-b|--batch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bracket is missed:
-b|--batch)
@@ -10,6 +10,17 @@ $(basename "$0") [-c CONFIG] [FILE...] | |||
PARAMETERS: | |||
CONFIG -- configuration file | |||
FILE -- file in NDJSON format for loading to Elasticsearch via bulk interface | |||
|
|||
OPTIONS: | |||
-b, --batch {e[abled]|d[isabled]} Specifies batch-mode: (e)nabled|(d)isabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could be handy to use this option as a flag: presented = enabled, missed = disabled.
Yet to make it possible to use it with value, we need a bit of invention in parsing:
BATCHMODE='disabled'
case $1 in
...
-b|--batch)
if [ -z "$2" ] || [[ "$2" == -* ]]; # it is last command line argument, or the next one starts with '-'
then
BATCHMODE='enabled'
else
BATCHMODE="$2"
shift
fi
;;
...
esac
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgolosova
That looks very useful. I think, I should apply it right now. Thank you! :)
@@ -19,14 +30,26 @@ ES_CONFIG="${base_dir}/../../Elasticsearch/config/es" | |||
|
|||
while [ -n "$1" ]; do | |||
case "$1" in | |||
--config|-c) | |||
-c|--config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change must be done in a separate commit: it has nothing to do with custom EOB/EOP, it is just formatting.
--) | ||
shift | ||
break;; | ||
-*) | ||
echo "Unknown option: $1" >&2 | ||
log ERROR "Unknown option: $key" $(usage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This change has nothing to do with custom EOB/EOP.
$key
variable is not defined.
@@ -35,7 +58,7 @@ while [ -n "$1" ]; do | |||
shift | |||
done | |||
|
|||
log "Loading defaults and config $ES_CONFIG if any" | |||
log INFO "Loading defaults and config $ES_CONFIG if any" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change has nothing to do with custom EOB/EOP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it appear in another PR (I'm thinking about to move everything related with log() into another PR and formalize all mentioned connectors) or in a separated commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, separate PR is a great idea.
I always miss the moment when a set of small changes becomes not so small and deserves personal branch...
|
||
log "Putting data to ES" | ||
log INFO "Putting data to ES" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change has nothing to do with custom EOB/EOP.
8fad081
to
7775bed
Compare
*Adding help *Following options were implemented: EOP -- End-of-process EOB -- End-of-batch (old: $DELIMITER) BATCHMODE -- an option to support batch-mode processing. For `read -d` in bash EOB (as a delimiter) should be not more that one byte. For batch-mode processing that option should be enabled: -b|--batch (e)enabled|(d)isabled Default: disabled If EOB (-B|--eob) is specifiend with a custom marker (not "\x17" -- default), batch-mode will be enabled anyway. In other cases batch-mode should be enabled. Note: Empty EOB = disabled batch mode.
743bd53
to
aa7f30a
Compare
Fixing typo. Variable was changed: EOBatch --> EOB. It's supposed to use EOB/EOP variables to keep markers passed from a comand line unchangeable. The script works mostly with EOProcess/EOBatch variables.
There are several bugs that were fixed: 1. 'case' -- forgotten quotes. 'Case' that works with $BATCHMODE cheks the possibility to work in a batch mode. 2. Fixing EOB in reading options from a command line. It's supposed to use variables as they were passed via command line and to output/use for reading with 'echo -ne' statement. 3. Fixing reading with $delimiter local variable in upload_stream(). $delimiter keeps $EOBatch vatiable, and without new construction it's impossible to read with such kind of delimiters.
This PR is the old version of #PR179. |
Sink-connectors: 060, 069 -- support only EOP/EOB.
Source-connector: 010.
Batch-mode option added. That option allows to switch between two states when batch mode is enabled or disabled.
Default value for EOB: ETB (End-of-Transmission-Block character -- '\x17').