Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve sanitizing string #84

Closed
marimeireles opened this issue Nov 10, 2020 · 14 comments · Fixed by #85
Closed

Improve sanitizing string #84

marimeireles opened this issue Nov 10, 2020 · 14 comments · Fixed by #85
Labels
bug Something isn't working good first issue Good for newcomers mentored

Comments

@marimeireles
Copy link
Member

Two spaces instead of one will cause the tokenized function to grab a char space as a word.
This shouldn't happen.

@marimeireles marimeireles added bug Something isn't working good first issue Good for newcomers mentored labels Nov 10, 2020
@avinal
Copy link
Contributor

avinal commented Nov 10, 2020

Hello, I would like to work on this issue. Would you please explain this issue a bit? Thanks

@avinal
Copy link
Contributor

avinal commented Nov 10, 2020

I have taken a look and I think the function sanitize_string needs to be improved. And if I am correct then the function should remove double/more space characters and return a string with exactly one space between each word. Right?

std::string sanitize_string(const std::string& code)
{
/*
Cleans the code from inputs that are acceptable in a jupyter notebook.
*/
std::string aux = code;
aux.erase(
std::remove_if(
aux.begin(), aux.end(), [](char const c) {
return '\n' == c || '\r' == c || '\0' == c || '\x1A' == c;
}
),
aux.end()
);
return aux;

@marimeireles
Copy link
Member Author

Hey @avinal, of course! :)
So, the error is happening on xvega inputs, we have the following as an example:
%XVEGA_PLOT X_FIELD Level Y_FIELD Hitpoints MARK point WIDTH 100 HEIGHT 200 <> SELECT Level, Hitpoints FROM players
This works!
But if I add another space between Level and Y_FIELD for example:
%XVEGA_PLOT X_FIELD Level Y_FIELD Hitpoints MARK point WIDTH 100 HEIGHT 200 <> SELECT Level, Hitpoints FROM players
This will throw the err:
Error: This is not a valid command for SQLite XVega.

We're already treating the problem where we have a keyword that doesn't make sense, so that's not the problem here. The culprit for this bug is likely inside the utils.cpp file in the sanitize_string function (

std::string sanitize_string(const std::string& code)
). We need to think on a way of getting an infinite amount of spaces so we don't get this kind of error again. And any other improvements you think will be very useful too!

@marimeireles
Copy link
Member Author

Sorry for the delay, yeah :)
Exactly! But as I said, we should cover cases with more than 2 spaces too.

@avinal
Copy link
Contributor

avinal commented Nov 10, 2020

Thanks for the explanation. I am thinking about the tokenizer function too. It may be possible to deal with such cases at the time of tokenization rather than sanitizing the command string in advance. Since both are parsing functions, a combination of improvements in both can be more efficient/effective. I will do some experiments to get the best possible solution for this and let you know 👍🏼

@avinal
Copy link
Contributor

avinal commented Nov 10, 2020

As I was thinking modifying the tokenizer function did the job. Remember when we take input from I/O no matter how much space is added it takes only the string. So I just converted the vector-feeding loop to an input string stream. It is passing all tests on my machine. I have enabled a test too for the purpose by adding 2 and 3 spaces. Please have a look.

@marimeireles
Copy link
Member Author

Oh no, I pushed to the wrong branch and I closed the issue, sorry @avinal!
I fixed the test, but we're still having this problem only in xvega magics.
So, if you test a random string like we're doing in the tests, the tokenizer works. What's wrong here is, somewhere in the code we're not tokenizing the magics from xvega, please feel free to keep going if you're interested :)

@marimeireles
Copy link
Member Author

I'm going to merge some recent changes I made to the code, so if you keep working on this, make sure to git pull the recent changes. =)

@avinal
Copy link
Contributor

avinal commented Nov 11, 2020

Ok

Oh no, I pushed to the wrong branch and I closed the issue, sorry @avinal!
I fixed the test, but we're still having this problem only in xvega magics.
So, if you test a random string like we're doing in the tests, the tokenizer works. What's wrong here is, somewhere in the code we're not tokenizing the magics from xvega, please feel free to keep going if you're interested :)

No problem I will try to find the bug. Just have to ask if pull request #85 is an improvement or you gonna revert it?

@marimeireles
Copy link
Member Author

marimeireles commented Nov 11, 2020

We can leave it merged, sure! :)
Your approach is more refined.
I just merged the other PR, so you can pull again.
To test this case locally you can build the master branch and run this command: %XVEGA_PLOT X_FIELD Level Y_FIELD Hitpoints MARK point WIDTH 100 HEIGHT 200 <> SELECT Level, Hitpoints FROM players
If you're interested in creating a test for this you can have a look in this comment: #70 (comment), but this is going where I never went before, so idk how hard it is! so no need to do it, I'll merge your PR anyways :)
I'm not sure if I'll be around in the next couple of hours but will be back later today, feel free to ask anything, also here: https://gitter.im/QuantStack/Lobby lots of helpful people.
Thanks @avinal!

@avinal
Copy link
Contributor

avinal commented Nov 11, 2020

Actually the only issue I can see is no description for setting the development environment. It is not documented anywhere 😅. I will set it first so that I could run full tests locally too. Thanks for the gitter link 👍🏼

@avinal
Copy link
Contributor

avinal commented Nov 11, 2020

I had it all set up and ran a few tests manually. Nothing seems broken, at least the parts we are discussing here. Also, I took a look at the source code and it fascinates me. I would really like to be a contributor to this project.

Apart from that, I saw #70 also, I am thinking of implementing that, It would help us find bugs. Secondly, if you can tell how exactly things are NOT working, I can find it(I am good at debugging, it's time to put it to test 😄 ). The next thing, documentation is really horrible, I had to refer to dozens of sites to finally able to run it. At least the development environment should be properly documented.

So if everything is okay I would like to keep working on this. Thanks a lot

@marimeireles
Copy link
Member Author

Also, I took a look at the source code and it fascinates me. I would really like to be a contributor to this project.

Oh, cool! :D
It was recently refactored, still quite a few room for improvement though.

Secondly, if you can tell how exactly things are NOT working, I can find it(I am good at debugging, it's time to put it to test smile ).

hehe yeah, sure.
Maybe it's something weird on my machine! I'll record a screencast :)

@marimeireles
Copy link
Member Author

Hey @avinal I was wrong, my env was messed up or I wasn't running the last version of the xsqlite
But it's working like it should! :)
Thank you for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers mentored
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants