Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows issue fix #14

Open
wants to merge 5 commits into
base: seriatim
Choose a base branch
from
Open

Windows issue fix #14

wants to merge 5 commits into from

Conversation

pverkind
Copy link

@pverkind pverkind commented Jul 8, 2022

Fix for issue #13

Proposed solution:

  1. use quotation marks around "xxhash64(series) as gid" instead of escaping the spaces
  2. use a placeholder "LESSTHAN" instead of the less than sign, which is not allowed with CALL, and replace it with the less than character "<" in seriatim.py

Possible improvements: replace less than, greater than and pipe in user-defined arguments with LESSTHAN, GREATERTHAN and PIPE.

pverkind added 2 commits July 8, 2022 14:56
The solution to the spaces problem is to wrap the argument in quotation marks (this only works if there is only one pair of quotation marks!)
The problem with the less than character is that CALL does not allow the less than character (see https://docs.microsoft.com/en-us/windows-server/administration/windows-commands/call):  "Do not use pipes (|) or redirection symbols (< or >) with call." 
The solution could be to replace that character with a placeholder, and to replace that placeholder with a less than character in seriatim.py. I propose using LESSTHAN rather than the more common "LT" because "LT" is common enough in natural language to create problems if it is replaced with "<".
A solution to the fact that the less than character is not allowed in CALL (in passim.cmd); use a placeholder "LESSTHAN" which is replaced in the config object with "<". Same approach could be used for the other problematic characters in CALL: pipe and greater than.
@dasmiq
Copy link
Owner

dasmiq commented Jul 8, 2022

I'm not entirely happy with the ad hoc symbol names in seriatim.py for just that argument. I think the problem is that, on Windows, passim.cmd calls the other command seriatim.cmd, so effectively you have to escape the escape characters to get it to work. What if you just copied the contents of seriatim.cmd to passim.cmd and then added the --all-pairs, --fields and --filterpairs arguments to it?

@pverkind
Copy link
Author

pverkind commented Jul 8, 2022

I can see why you don't like that. I tried what you proposed now, but I don't seem to get it to work. The same problem seems to apply here.

When I copy the contents of seriatim.cmd into passim.cmd and change the last line to this, I get the same "The system cannot find the file specified" error as before:

spark-submit --repositories https://repos.spark-packages.org/ ^
  --packages graphframes:graphframes:0.8.0-spark3.0-s_2.12 ^
    %SPARK_SUBMIT_ARGS% %SCRIPT% %* --all-pairs --fields "xxhash64(series) as gid" -f gid<id2

(I tried also all the other combinations with caret escapes for the less than character and double quotation marks and backquotes)

But when I try with the LESSTHAN paramater, it works again:

spark-submit --repositories https://repos.spark-packages.org/ ^
  --packages graphframes:graphframes:0.8.0-spark3.0-s_2.12 ^
    %SPARK_SUBMIT_ARGS% %SCRIPT% %* --all-pairs --fields "xxhash64(series) as gid" -f gidLESSTHANid2

if you, @dasmiq or @mutherr , could give it a try and see if you can get it to work, that would be great.

@dasmiq
Copy link
Owner

dasmiq commented Jul 8, 2022

Did you try "gid<gid2" or gid^<gid2 ? Can you escape it with backslash, e.g., "gid\<gid2" or gid\<gid2 ?

@pverkind
Copy link
Author

pverkind commented Jul 8, 2022

I tried "gid^<gid2", "gid<gid2", gid^<gid2, gid<gid2, gid<gid2, gid<gid2 ; and because you asked, I now also tried gid<gid2 and "gid<gid2", all with the same effect

@pverkind
Copy link
Author

pverkind commented Jul 8, 2022

Do I understand correctly that your problem with my earlier proposal is mostly that it works only for this one argument?
If so, it would perhaps be possible to do a replace operation for all offending characters (<, >, |) on all arguments in passim.cmd and seriatim.cmd, and then do the inverse replacement on all (relevant) arguments in seriatim.py?

(I looked into find and replace in Windows CMD, and it does not seem straightforward, though)

pverkind added 2 commits July 9, 2022 00:43
Replace special characters <, >, |, & in all arguments (not just the built-in filterfields argument) with a placeholder
which will be replaced with the original character in seriatim.py
NB: unfortunately, this does not work with the equals sign; use _EQ_, _LTE_ and _GTE_ for such operators instead.
I added replacement of placeholders for special files to all fields I thought would need them.
@pverkind
Copy link
Author

pverkind commented Jul 8, 2022

I looked into the replace functionality within Windows batch scripts and it is not great; but I managed to add some code to passim.cmd that replaces all <, >, & and | characters with placeholders in all arguments, not just the built-in filterpairs argument.
I also added the reverse replacement in seriatim.py for the only other field I thought it was relevant for (more could be added of course).

Please ignore this if you think this is a horrible idea - but at least it makes passim work on Windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants