Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong complexity class #2

Open
conartist6 opened this issue Jul 24, 2020 · 9 comments
Open

Wrong complexity class #2

conartist6 opened this issue Jul 24, 2020 · 9 comments

Comments

@conartist6
Copy link

I was just looking for references on parser core implementation, and I happened to find this repo. I thought it worth pointing out that this implementation has n squared complexity (on the number of tokens) where it should be linear. This is the result of using shift and unshift to manipulate the tokens array instead of push and pop. Shift is, of course, an O(n) operation on an array as every existing value must move to a new index.

@jeromew
Copy link

jeromew commented Jul 25, 2020

Hello,

Things are probably less clear cut in Javascript/v8 than what you mention in your issue.
Cf for example this discussion regarding list operations in perl where it explains why common scenarios are O(1)

The consequence of this implementation is that all of perl's primitive list operators (insertion, fetching, determining array size, push, pop, shift, unshift, etc.) perform in O(1) time.

There are probably similar optimizations in Javascript/v8.

Did you write benchmarks highlighting the "n squared complexity" of the token-stream module or comparing the complexity of the Javascript/v8 implementation of push, pop, shift, unshift ?

@conartist6
Copy link
Author

conartist6 commented Jul 25, 2020

I've never heard anyone claim that JS shift/unshift are O(1). I did in fact do some research to verify before I opened this ticket. SpiderMonkey optimizes by copying the entire array into new memory instead of moving every element, but it's still not the right complexity class. See: https://stackoverflow.com/questions/44031591/performance-of-array-push-vs-array-unshift.

@conartist6
Copy link
Author

Here's more data for you: https://jsperf.com/array-push-vs-unshift. No graph, but it confirms that v8 also does not implement an O(1) opt.

@jeromew
Copy link

jeromew commented Jul 25, 2020

Interesting it seems like it could be interesting to investigate. Be sure that I was not judging you in my message and I am sorry if my tone sounded dubious. it is just that sometimes v8 auto-optimisations go a long way and I am surprised that unshift gets a so bad performance compared to what perl seem to be doing.

I did not participate in the token-stream module did you read the code and does it seem easy to you to "reverse" the implementation (using push vs unshift) or would it need more work ?

@jeromew
Copy link

jeromew commented Jul 25, 2020

I just opened the code I did not realize it is a very small module beeing mainly a wrapper around the shift/unshift functions.

Maybe it is easy to swap things and test it all with pug-parser. It could indeed make a performance improvement if the O(n) complexity is heavily hit during the parsing that would be a great discovery !

@conartist6
Copy link
Author

Yeah that's kind of the curse of JS -- perf isn't explicit and it's hard to keep track of which optimizations are where and whether they can be relied on.

Looking at the code the weirdest bit is that an external array is mutated. Using this class may have side effects. If you copy the array then you could just reverse it in the constructor, swap all the access logic around, and you'd be fine. I thought you might even be able to replace array mutation with simple indexing, but then the defer method wouldn't work.

@conartist6
Copy link
Author

Since mutation of the passed array is surprising and not documented, I'd consider changing that behavior a semver-patch bugfix.

@jeromew
Copy link

jeromew commented Jul 27, 2020

I think that this module is mainly used as a wrapper around the array adding a sort of parsing oriented syntaxic sugar so dependent modules are probably aware and ok with the mutation that occurs.

I found this jsperf - https://jsperf.com/ilogico-reverse-and-pop-vs-shift - that seems to show that "reverse + pop" is nearly equivalent to shift (the manual reverse + pop is 2x better than shift though)

I think it needs experimentation to see if this would make a big difference of not for pugjs and if/where the reverse should be taking place. The tokens here are coming from pug-lexer that uses push to build up the array of tokens so it seems that the reverse will need to take place at some point if pop/push should be used in the parser instead of shift/unshift.

Or there are maybe modules like https://www.npmjs.com/package/fast-list but then that might cause a problem with lookahead(index) that would degrade its perf.

@ForbesLindesay
Copy link
Member

@conartist6 Thanks for reporting this. You're totally right that this is probably causing significantly worse perf in pug than it should be. I wrote this a long time ago and have had to resolve issues just like this on other libraries I maintain (e.g. throat's heavily optimised queue implementation). I think the best option here will be to just track the index in the Array without actually removing values. Since the tokens array is usually short-lived, that should be pretty efficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants