Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native and WASM parsers behave differently #69

Open
wenkokke opened this issue Jan 4, 2022 · 11 comments
Open

Native and WASM parsers behave differently #69

wenkokke opened this issue Jan 4, 2022 · 11 comments

Comments

@wenkokke
Copy link
Contributor

wenkokke commented Jan 4, 2022

For instance, if we parse examples/postgrest/test/Main.hs:

  • using native, the top-level node is haskell;
  • using wasm, the top-level node is ERROR.

In this case, the problem seems to be absent module headers.

@414owen
Copy link
Contributor

414owen commented Jan 5, 2022

That's interesting, you can try running a -DDEBUG build, and see what the first few tokens the scanner produces are?

@wenkokke
Copy link
Contributor Author

wenkokke commented Jan 5, 2022

Not quite sure how to run a -DDEBUG build? You should be able to run these tests yourself, though. Lemme dig up the commands.

@wenkokke
Copy link
Contributor Author

wenkokke commented Jan 5, 2022

Check out #68 and run npm run examples-wasm. You'll need Node and Emscripten.

@414owen
Copy link
Contributor

414owen commented Jan 6, 2022

I'll give it a go.

I've documented -DDEBUG in enable scanner debug output here

@414owen
Copy link
Contributor

414owen commented Jan 6, 2022

Parsing examples/postgrest/test/Main.hs passes if you delete either both let statements, or the spec $ do ... block

@414owen
Copy link
Contributor

414owen commented Jan 6, 2022

I've created a list of the (smallest to largest) files that fail to parse under wasm:

{ for i in examples/**/*.hs; do out="$(2>&1 ./script/tree-sitter-parse.js $i)"; if echo "$out" | grep 'Parse error' > /dev/null; then wc -l "$i"; fi; done } | sort -n | tee list

output: here

The first file passes when you delete the -> in the GADT, or the type signature of test, or any two comment lines... I'll have a proper look this evening.

If you can spot anything that those files have in common, and others don't, let me know

@maxbrunsfeld
Copy link
Contributor

maxbrunsfeld commented Jan 6, 2022

One likely reason for the behavior difference is that tree-sitter’s get_column function currently returns a byte count, not a character count. The JavaScript bindings encode strings as UTF16, so most characters are 2 bytes instead of 1, as in UTF8.

The get_column method is still a bit experimental, and we’ve talked about changing it back to returning a character count. In the meantime, scanners should make sure to not make assumptions about the absolute values that method returns, but instead just compare the values to each other.

Not sure if that’s the problem here, but wanted to raise it, since it’s gotcha currently, and this is one of the few grammars affected.

@414owen
Copy link
Contributor

414owen commented Jan 6, 2022

Thanks @maxbrunsfeld, that sounds very plausible.

I wonder if I halve double the indentation on those files whether they'll break with the native parser...

edit they didn't

@ahelwer
Copy link

ahelwer commented Jan 7, 2022

It's about time I get off my butt and write good tests for my codepoint-based get_column tree-sitter fork so it can be merged upstream.

@wenkokke
Copy link
Contributor Author

I can confirm that this is mostly fixed using the latest version, the only file that doesn't parse using the web assembly bindings is <examples/haskell-language-server/test/testdata/format/BrittanyCRLF.formatted_range.hs>.

@maxbrunsfeld
Copy link
Contributor

Ok great! If anyone gets a chance, it'd be super useful to extract from that file a minimal example of how the WASM and native parsers behave differently. At this point, if they still behave differently, that seems most likely to be a bug in Tree-sitter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants