Wrong complexity class #2

conartist6 · 2020-07-24T16:55:40Z

I was just looking for references on parser core implementation, and I happened to find this repo. I thought it worth pointing out that this implementation has n squared complexity (on the number of tokens) where it should be linear. This is the result of using shift and unshift to manipulate the tokens array instead of push and pop. Shift is, of course, an O(n) operation on an array as every existing value must move to a new index.

jeromew · 2020-07-25T08:38:17Z

Hello,

Things are probably less clear cut in Javascript/v8 than what you mention in your issue.
Cf for example this discussion regarding list operations in perl where it explains why common scenarios are O(1)

The consequence of this implementation is that all of perl's primitive list operators (insertion, fetching, determining array size, push, pop, shift, unshift, etc.) perform in O(1) time.

There are probably similar optimizations in Javascript/v8.

Did you write benchmarks highlighting the "n squared complexity" of the token-stream module or comparing the complexity of the Javascript/v8 implementation of push, pop, shift, unshift ?

conartist6 · 2020-07-25T14:52:37Z

I've never heard anyone claim that JS shift/unshift are O(1). I did in fact do some research to verify before I opened this ticket. SpiderMonkey optimizes by copying the entire array into new memory instead of moving every element, but it's still not the right complexity class. See: https://stackoverflow.com/questions/44031591/performance-of-array-push-vs-array-unshift.

conartist6 · 2020-07-25T15:08:08Z

Here's more data for you: https://jsperf.com/array-push-vs-unshift. No graph, but it confirms that v8 also does not implement an O(1) opt.

jeromew · 2020-07-25T20:17:59Z

Interesting it seems like it could be interesting to investigate. Be sure that I was not judging you in my message and I am sorry if my tone sounded dubious. it is just that sometimes v8 auto-optimisations go a long way and I am surprised that unshift gets a so bad performance compared to what perl seem to be doing.

I did not participate in the token-stream module did you read the code and does it seem easy to you to "reverse" the implementation (using push vs unshift) or would it need more work ?

jeromew · 2020-07-25T20:29:08Z

I just opened the code I did not realize it is a very small module beeing mainly a wrapper around the shift/unshift functions.

Maybe it is easy to swap things and test it all with pug-parser. It could indeed make a performance improvement if the O(n) complexity is heavily hit during the parsing that would be a great discovery !

conartist6 · 2020-07-25T20:37:15Z

Yeah that's kind of the curse of JS -- perf isn't explicit and it's hard to keep track of which optimizations are where and whether they can be relied on.

Looking at the code the weirdest bit is that an external array is mutated. Using this class may have side effects. If you copy the array then you could just reverse it in the constructor, swap all the access logic around, and you'd be fine. I thought you might even be able to replace array mutation with simple indexing, but then the defer method wouldn't work.

conartist6 · 2020-07-25T20:39:09Z

Since mutation of the passed array is surprising and not documented, I'd consider changing that behavior a semver-patch bugfix.

jeromew · 2020-07-27T13:06:15Z

I think that this module is mainly used as a wrapper around the array adding a sort of parsing oriented syntaxic sugar so dependent modules are probably aware and ok with the mutation that occurs.

I found this jsperf - https://jsperf.com/ilogico-reverse-and-pop-vs-shift - that seems to show that "reverse + pop" is nearly equivalent to shift (the manual reverse + pop is 2x better than shift though)

I think it needs experimentation to see if this would make a big difference of not for pugjs and if/where the reverse should be taking place. The tokens here are coming from pug-lexer that uses push to build up the array of tokens so it seems that the reverse will need to take place at some point if pop/push should be used in the parser instead of shift/unshift.

Or there are maybe modules like https://www.npmjs.com/package/fast-list but then that might cause a problem with lookahead(index) that would degrade its perf.

ForbesLindesay · 2022-09-02T02:05:32Z

@conartist6 Thanks for reporting this. You're totally right that this is probably causing significantly worse perf in pug than it should be. I wrote this a long time ago and have had to resolve issues just like this on other libraries I maintain (e.g. throat's heavily optimised queue implementation). I think the best option here will be to just track the index in the Array without actually removing values. Since the tokens array is usually short-lived, that should be pretty efficient.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong complexity class #2

Wrong complexity class #2

conartist6 commented Jul 24, 2020

jeromew commented Jul 25, 2020

conartist6 commented Jul 25, 2020 •

edited

Loading

conartist6 commented Jul 25, 2020

jeromew commented Jul 25, 2020

jeromew commented Jul 25, 2020

conartist6 commented Jul 25, 2020

conartist6 commented Jul 25, 2020

jeromew commented Jul 27, 2020

ForbesLindesay commented Sep 2, 2022

Wrong complexity class #2

Wrong complexity class #2

Comments

conartist6 commented Jul 24, 2020

jeromew commented Jul 25, 2020

conartist6 commented Jul 25, 2020 • edited Loading

conartist6 commented Jul 25, 2020

jeromew commented Jul 25, 2020

jeromew commented Jul 25, 2020

conartist6 commented Jul 25, 2020

conartist6 commented Jul 25, 2020

jeromew commented Jul 27, 2020

ForbesLindesay commented Sep 2, 2022

conartist6 commented Jul 25, 2020 •

edited

Loading