Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[absolete]Connectors' markers implementation -- 010, 060, 069 #157

Closed
wants to merge 12 commits into from

Conversation

anastasiakaida
Copy link
Contributor

Sink-connectors: 060, 069 -- support only EOP/EOB.
Source-connector: 010.

Batch-mode option added. That option allows to switch between two states when batch mode is enabled or disabled.
Default value for EOB: ETB (End-of-Transmission-Block character -- '\x17').

EOProcess: -E|--eop
EOMessage: -e|--eom

*No default EOM/EOP values specified
*EOM/EOP variables ($EOM, $EOP) are not applied now in this .sh script
If -p or -f specified -- it's "file mode".
Otherwise -- "stream mode".

$EOM and $EOP variables are internal and keep data from command line.
$EOMarkers and $EOProcess variables are for output.
Default:
(f)ile mode -- none
(s)tream mode -- '\0'
For batch-mode processing that option should be enabled:

-b|--batch (e)enabled|(d)isabled
Default: disabled

If EOB (-B|--eob) is specifiend with a custom marker
(not "\x17" -- default), batch-mode will be enabled anyway.

In other cases batch-mode should be enabled.

Note: Empty EOB = disabled batch mode.
@anastasiakaida
Copy link
Contributor Author

anastasiakaida commented Aug 10, 2018

@mgolosova

I've met some collisiong in #130, so I decided to close the previous one and open a new one. Please, notice me if I forgot to fix something. :)

PS: I still don't know that to do with the last line fix in 19175cd. I would like to use 'strict realization', but I'm not sure what to do firstly. Should I leave it as it is now or make some improvements?

@@ -119,7 +120,8 @@ upload_files () {
;;
esac

eval "$cmd" || { echo "(ERROR) An error occured while uploading file: $INPUTFILE" >&2; continue; }
eval "$cmd" || { echo "(ERROR) An error occured while uploading file: $INPUTFILE" >&2; } &
echo -ne "$EOProcess"
Copy link
Collaborator

@mgolosova mgolosova Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened here?
"custom EOP implementation" makes me think that the answer should be "Custom EOP marker output added". But the diff says:

  • continue directive removed;
  • eval "$cmd" || echo is sent to background;
  • custom EOP output added.

What are the first two changes are made for?

Update: by the way, I see I made a comment to this change in #130; now I see addressed only the part concerning continue (and it was done quite silently, as if anyone reading this commit must know about our conversation).
If you think the continue directive should be deleted -- explain it in commit log; if you just follow my faint comment "I`m not sure if it is needed here..." -- then do not. Ask why it is not needed. Then think, does this change belong to this commit or not. Please, do not try to guess what do I expect you to do... if you know -- do it and show that you understand what and why was done. If not -- ask, and ask again if needed, and do it till you understand the idea and can explain even better than me.
What about everything else I wrote there?

-d|--delimiter)
DELIMITER=`echo -e $2`
-b|--batch
BATCHMODE="${2,,}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "${2,,}", not simply "$2"?

[ -z "$DELIMITER" ] && DELIMITER=$'\0'
[ "x$DELIMITER" = "xNOT SPECIFIED" ] && DELIMITER=$'\n'

[ -n "$EOB" ] && EOBatch="$EOB"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the EOB variable is used for? It is not assigned anywhere in the code...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgolosova

I found an implicit mistake in my code because of that question. Thank you!

The thing is that I named all variables which keeps parameters from a comand line like EOM/EOP/EOB (my mistake is that I named it as "EOBatch"). In 010 connector I saw that it would be better to pass markers from another variables EOMessage/EOProcess, because it was a little bit inconvenient for me (according to the code's logic) to use variables EOM/EOP.

I hope, after renaming it'll become much more clear. I'll create an extra commit with explanation of my decision.

shift
;;
-B|--eob)
EOBatch=`echo -ne $2`
Copy link
Collaborator

@mgolosova mgolosova Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we have quite a long discussion about this mistake I`ve done when wrote this script :)
What happens when echo -ne $2 is used?

  1. If $2 was "\0" or "\n", the value of EOBatch will be the empty string, as shell trims all empty/space symbols from command substitution result. See in the example with \n:
[mgolosova@bamboo dkb]$ echo -e "abcde\n\n\n"                                                                                                                                                                
abcde



[mgolosova@bamboo dkb]$ echo `echo -ne "abcde\n\n\n"`
abcde
[mgolosova@bamboo dkb]$ 
  1. Even if $2 was not a code of "empty/space" symbol: here you get the interpreted value, while the default value is not interpreted: EOBatch='\x17'. What do you pass to upload_stream? Interpreted value or not? If not -- what will be used as a delimiter in read -r -d "$delimiter"?
    See (mind 0a for newline and 0a 17 for newline+EOB in hexdump):
[mgolosova@bamboo dkb]$ EOBatch='\x17'                                                                                                                                                                             
[mgolosova@bamboo dkb]$ echo -ne "message_1\nmessage_2\n${EOBatch}message_3\nmessage_4\n${EOBatch}" \
>   | hexdump -C
00000000  6d 65 73 73 61 67 65 5f  31 0a 6d 65 73 73 61 67  |message_1.messag|
00000010  65 5f 32 0a 17 6d 65 73  73 61 67 65 5f 33 0a 6d  |e_2..message_3.m|
00000020  65 73 73 61 67 65 5f 34  0a 17                    |essage_4..|
0000002a
[mgolosova@bamboo dkb]$ echo -ne "message_1\nmessage_2\n${EOBatch}message_3\nmessage_4\n${EOBatch}" \
>   | while read -r -d "$EOBatch" line; do echo $line; done
[mgolosova@bamboo dkb]$ echo -n "message_1\nmessage_2\n${EOBatch}message_3\nmessage_4\n${EOBatch}" \
>   | while read -r -d "$EOBatch" line; do echo $line; done                                                         
message_1
nmessage_2
n
x17message_3
nmessage_4
n
[mgolosova@bamboo dkb]$ echo -ne "message_1\nmessage_2\n${EOBatch}message_3\nmessage_4\n${EOBatch}" \
>   | while read -r -d "`echo -ne $EOBatch`" line; do echo $line; done                                             
message_1 message_2
message_3 message_4
[mgolosova@bamboo dkb]$ 

Resume:

  • variable should always contain interpreted or not interpreted value, whenever it was (re)assigned;
  • to read should be passed interpreted value;
  • we have to do something with "empty/space" symbols: read uses \0 as delimiter if empty string is passed, but I have no idea right now how to pass it \n :)

log ERROR "Unexpected batch-mode parameter." && { [ -n "$BATCHMODE" ] && usage && return 2; }
break
;;
esac
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I`m getting this case statement.
What makes me puzzled:

  • BATCHMODE is just assigned with "d" (rewriting user value); why use case $BATCHMODE, when we already know the value for sure?
  • EOBatch is just assigned with default value '\x17'; why compare it with "\x17" again?
  • after this case, EOBatch will stay assigned with '\x17' only when batchmode is enabled and EOB is not empty;
  • after this case, EOBatch will be assigned with value of EOB only when batchmode is disabled and EOB is not empty and not "\x17".

What did you mean to do?

;;
d|disabled)
( [ -z "$EOB" ] || [ $EOB == "\x17" ] || [ $EOBatch == "\x17" ] ) && EOBatch="\n"
( [ -n "$EOB" ] && [ ! $EOB == "\x17" ] ) && EOBatch="$EOB"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I`m getting this case statement properly... It is similar to the one puzzled me in 069_upload2es/load_data.sh, but here we have slightly different initial conditions...

First, what does it say now (suggesting, that EOB should be used as EOP, for custom value passed with --eob):
"If batch mode enabled, EOBatch is \n unless EOB is defined. If batch mode is disabled, EOBatch is \n unless EOB is defined with anything but \x17 -- else EOBatch is EOB."

Concerning conditions... as we have this lines above:

...
    -B|--eob)
      EOB=`echo -ne $2` # allowed myself to replace EOBatch with EOB
...
[ -n "$EOB" ] && EOBatch="$EOB"

within the case:

  • if EOB is empty, EOBatch should be "\x17" (the default value);
  • if EOB is "\x17" (or, better to say, if user passed --eob \x17 in the command line), then EOB is not string "\x17", but ASCII code ETB;
  • if, after all, EOB is string "\x17", then EOBatch is "\x17" too.

So, in plain words -- what should be done during this case?

Copy link
Contributor Author

@anastasiakaida anastasiakaida Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgolosova

I supposed that case allows here to check the following conditions:

if batch mode is disabled (by hands or by default):

  1. no EOB = --eob '' --> $EOB='\n'
  2. we have EOB, but \x17 (hex) --> $EOB='\n' (it might be unnecessary and too strict, but I'm not sure)
  3. we have any other EOB --> $EOB='EOB'

if batch mode is enabled:

  1. only if we type --eob '' --> $EOB='\n'

UPD: it seems I forgot about quotes at all in case, shame on me. :(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. I think there`s a little bit of mess happened with enabled/disabled batch mode etc...


First, about this (just to make sure we do understand things same way).
@anastasiakaida wrote:

we have EOB, but \x17 (hex) --> $EOB='\n' (it might be unnecessary and too strict, but I'm not sure)

This checks:

[ ! $EOB == "\x17" ]
[ $EOB == "\x17" ]

are string comparison. Works this way:

mgolosova@artois-K:~$ a_str="\x17"
mgolosova@artois-K:~$ a_ascii=$'\x17'
mgolosova@artois-K:~$ echo "str: '$a_str'; ASCII: '$a_ascii'"
str: '\x17'; ASCII: '�'
mgolosova@artois-K:~$ [ "$a_str" == "\x17" ] && echo "True" || echo "False"
True
mgolosova@artois-K:~$ [ "$a_ascii" == "\x17" ] && echo "True" || echo "False"
False

Now EOB is assigned with string value, so it`s OK; just wanted to make things clear.


Next, about the general logic.
I have tested it with attached script: test_batchmode.txt

Here`s the results:

   -b   ||  X   |  X   |   X    |   X    || 'e'  | 'e'  |  'e'   |  'e'   |
------- || ---- | ---- | ------ | ------ || ---- | ---- | ------ | ------ | 
   -B   ||  X   |  ''  | '\x17' | '\x11' ||  X   |  ''  | '\x17' | '\x11' |
======= || ==== | ==== | ====== | ====== || ==== | ==== | ====== | ====== |
EOBatch || '\n' | '\n' |  '\n'  | '\x11' || '\n' | '\n' | '\x17' | '\x11' |

How do I read it: batch mode is actually enabled in three cases:

  • -B 'non-default marker';
  • -b -B 'default marker';
  • -b -B 'non-default marker'.

While I would expect the following table:

   -b   ||  X   |  X   |   X    |   X    ||  'e'   | 'e'  |  'e'   |  'e'   |
------- || ---- | ---- | ------ | ------ || ------ | ---- | ------ | ------ | 
   -B   ||  X   |  ''  | '\x17' | '\x11' ||   X    |  ''  | '\x17' | '\x11' |
======= || ==== | ==== | ====== | ====== || ====== | ==== | ====== | ====== |
EOBatch || '\n' | '\n' | '\x17' | '\x11' || '\x17' | '\n' | '\x17' | '\x11' |

Two cases added:

  • -b;
  • -B 'default marker'.

In other words:

  • it shouldn`t matter if user passed default marker or not: if it is passed implicitly, it is preferable value;
  • if user said "enable batch mode", then we should use default EOB for batch mode.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*) And about forgotten quotes: it was totally OK; shell takes non-quoted stuff as a string unless it is a keyword or the string is followed by = (meaning that it is variable assignment). Quotes are required for multi-word strings (though they could be written without quotes too, if spaces are escaped) and to escape special characters.
See:

mgolosova@artois-K:~$ a=abcdef
mgolosova@artois-K:~$ echo $a
abcdef
mgolosova@artois-K:~$ echo abcdef
abcdef
mgolosova@artois-K:~$ a=abc def
def: command not found
mgolosova@artois-K:~$ a=abc\ def
mgolosova@artois-K:~$ echo $a
abc def
mgolosova@artois-K:~$ echo abc\def
abcdef
mgolosova@artois-K:~$ echo abc\ def
abc def
mgolosova@artois-K:~$ echo "abc\ def"
abc\ def
mgolosova@artois-K:~$ echo "abc def"
abc def

So in case it is OK to write strings without quotes (as values, not keywords or something are expected); moreover, if a wildcard '*' is used, quotes will escape it and it will be taken as an asterisk character, not a wildcard:

mgolosova@artois-K:~$ a=abcd
mgolosova@artois-K:~$ case $a in
> "a*")
>   echo 'Quoted case: "a*"';;
> a*)
>   echo 'Non quoted case: a*';;
> *)
>   ;;
> esac
Non quoted case: a*

Yet in this case quotes are also OK to use, as no wildcard or something is used.

EOB="$2"
shift;;
-b|--batch)
BATCHMODE="${2,,}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, why "${2,,}", not simply "$2"?
There seem to be no difference:

[mgolosova@bamboo dkb]$ f () {
>   echo "'${1,,}'"
>   echo "'${1}'"
> }
[mgolosova@bamboo dkb]$ f enabled
'enabled'
'enabled'
[mgolosova@bamboo dkb]$ f e
'e'
'e'
[mgolosova@bamboo dkb]$ f
''
''


cmd="curl $ES_AUTH http://$ES_HOST:$ES_PORT/_bulk?pretty --data-binary @"

load_files () {
[ -z "$1" -o ! -f "$1" ] && log $(usage) && exit 1
[ -z "$1" -o ! -f "$1" ] && log NOTSET $(usage) && exit 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why NOTSET? I`d say INFO or ERROR or, if the idea is to output usage without log level -- use usage >&2, plain usage, or maybe even teach log to work with skipped level parameter...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgolosova

I couldn't decide which level (ERROR or INFO) that case belongs to. I decided to assign to it level 0 (NOTSET). I took into account PR #135, so why not to mention NOTSET directly and allow to handle log with skipped parameter as level 0?

data in the stream mode.
Default: \n
Kafka-style: \0
Kafka-style: \x17
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that as "batch mode" is formalized and properly defined, there`s no such thing as "Kafka style" any longer. Kafka style was informal description of "batch mode", as Kafka-supervised processing was the only case when batch mode was used.

Copy link
Contributor Author

@anastasiakaida anastasiakaida Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgolosova

Got it. I suppose, it would be better to represent it as:

Dafault: \x17
No batch-mode: \n

@@ -163,7 +163,7 @@ upload_stream () {
esac

while true; do
while read -r -d "$delimiter" line; do
while read -r -d "$delimiter" line || [[ -n "$line" ]]; do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concerning your question: I think, if we want to work in "strict" mode (no EOM -- no message, no EOB -- no batch), then this change is not required.

Yet thank you for noticing: I have missed this issue when wrote the code.

By the way: if this commit was to be kept, there must be issue description in the commit log, right? ;)

@@ -69,7 +69,8 @@ OPTIONS:
}

upload_files () {

EOProcess=""

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EOProcess is explicitly defined in both upload_files and upload_stream. How user value is supposed to be used?

@@ -209,8 +211,12 @@ do
MODE="${2,,}"
shift
;;
-d|--delimiter)
DELIMITER=`echo -e $2`
-b|--batch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bracket is missed:
-b|--batch)

@@ -10,6 +10,17 @@ $(basename "$0") [-c CONFIG] [FILE...]
PARAMETERS:
CONFIG -- configuration file
FILE -- file in NDJSON format for loading to Elasticsearch via bulk interface

OPTIONS:
-b, --batch {e[abled]|d[isabled]} Specifies batch-mode: (e)nabled|(d)isabled.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be handy to use this option as a flag: presented = enabled, missed = disabled.
Yet to make it possible to use it with value, we need a bit of invention in parsing:

BATCHMODE='disabled'
case $1 in
  ...
  -b|--batch)
    if [ -z "$2" ] || [[ "$2" == -* ]]; # it is last command line argument, or the next one starts with '-'
    then
      BATCHMODE='enabled'
    else
      BATCHMODE="$2"
      shift
    fi
    ;;
  ...
esac

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgolosova
That looks very useful. I think, I should apply it right now. Thank you! :)

@@ -19,14 +30,26 @@ ES_CONFIG="${base_dir}/../../Elasticsearch/config/es"

while [ -n "$1" ]; do
case "$1" in
--config|-c)
-c|--config)
Copy link
Collaborator

@mgolosova mgolosova Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change must be done in a separate commit: it has nothing to do with custom EOB/EOP, it is just formatting.

--)
shift
break;;
-*)
echo "Unknown option: $1" >&2
log ERROR "Unknown option: $key" $(usage)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This change has nothing to do with custom EOB/EOP.
  2. $key variable is not defined.

@@ -35,7 +58,7 @@ while [ -n "$1" ]; do
shift
done

log "Loading defaults and config $ES_CONFIG if any"
log INFO "Loading defaults and config $ES_CONFIG if any"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change has nothing to do with custom EOB/EOP.

Copy link
Contributor Author

@anastasiakaida anastasiakaida Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mgolosova

Should it appear in another PR (I'm thinking about to move everything related with log() into another PR and formalize all mentioned connectors) or in a separated commit?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, separate PR is a great idea.
I always miss the moment when a set of small changes becomes not so small and deserves personal branch...


log "Putting data to ES"
log INFO "Putting data to ES"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change has nothing to do with custom EOB/EOP.

@anastasiakaida anastasiakaida force-pushed the stages-010-060-069-custom-markers branch from 8fad081 to 7775bed Compare August 14, 2018 13:17
*Adding help
*Following options were implemented:

EOP -- End-of-process
EOB -- End-of-batch (old: $DELIMITER)
BATCHMODE -- an option to support batch-mode processing.

For `read -d` in bash EOB (as a delimiter)  should be not more that one byte.
For batch-mode processing that option should be enabled:

-b|--batch (e)enabled|(d)isabled
Default: disabled

If EOB (-B|--eob) is specifiend with a custom marker
(not "\x17" -- default), batch-mode will be enabled anyway.

In other cases batch-mode should be enabled.
Note: Empty EOB = disabled batch mode.
@anastasiakaida anastasiakaida force-pushed the stages-010-060-069-custom-markers branch from 743bd53 to aa7f30a Compare August 14, 2018 14:20
Fixing typo.
Variable was changed: EOBatch --> EOB.

It's supposed to use EOB/EOP variables to keep markers
passed from a comand line unchangeable. The script works mostly
with EOProcess/EOBatch variables.
There are several bugs that were fixed:

1. 'case' -- forgotten quotes. 'Case' that works with $BATCHMODE
cheks the possibility to work in a batch mode.

2. Fixing EOB in reading options from a command line.
It's supposed to use variables as they were passed via command line
and to output/use for reading with 'echo -ne' statement.

3. Fixing reading with $delimiter local variable in upload_stream().
$delimiter keeps $EOBatch vatiable, and without new construction it's
impossible to read with such kind of delimiters.
@anastasiakaida anastasiakaida changed the title Connectors' markers implementation -- 010, 060, 069 [absolete]Connectors' markers implementation -- 010, 060, 069 Nov 1, 2018
@anastasiakaida
Copy link
Contributor Author

anastasiakaida commented Nov 8, 2018

This PR is the old version of #PR179.
This PR will be closed and saved till #PR179 will be undone or closed without a merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants