-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get faster #13
Comments
Funny thing is that when I benchmark the commonmark.js CLI against the djot one, the djot one is always slightly faster. So I am not sure how to interpret these results. It could be that there's a bit more startup overhead in each conversion for djot (perhaps because of how parser objects are created) and that this predominates in repeated tests parsing very small texts. When longer texts are using, the difference in startup overhead gets swamped and djot comes out ahead. It may be that there's some low hanging fruit to be plucked, but I'd probably need guidance from someone who is more experienced in optimizing JS. |
As a data-point, for me djot takes about 1.75x more time than common mark when rendering a 3.5mb file. Here's a flamegraph for djot, haven't look at it yet: https://share.firefox.dev/3IuG1qg |
This is using it with node, or in the browser? |
With node |
This part looks potentially optimizable:
If I understand this correctly, here we go through every "char" position and do a hashmap lookup. I think the map should be pretty sparse, so it'd be more efficient to convert matches to array and sort them by pos rather than enumerating every pos |
If we just used an array, we could perhaps just add the |
Yeah, there's some water to squeeze out this stone: if I replay array with |
I am wondering: can we refer to an earlier match not by codepoint offset, but by its position in the matches array? That is, we accumulate events in an array, events are generally .sorted by their startpos, but, if we change our mind about some past events, we go to their position in an array and replace them with tombstones. I use a similar device in rust-analyzer: Parsing events are accumulated into a flat array, but events can be tombstones, or they can refer to future events. (the backstory here is that, while brute-force replacement of array with map speeds things up, |
Yes, this sounds like a plausible idea. The main change would be adding to Other cases to handle:
And then we need to figure out what the tombstone is and how to strip it out. It could just be null, I suppose, and we could filter the array on [I can try this.] |
This yields a 10-15% speeed boost in my tests. See #13. However, we now fail a couple of smart quote tests. I need to look into that.
I've made a stab at this in #24. I think the code is more elegant (and shorter) now. But two tests fail and I haven't yet gotten to the bottom of it. |
This yields a significant performance boost. See #13. Of course, the speedup depends on the input. This will mostly affect texts with significant strings of inlines (longer paragraphs rather than just a few words). But on those, we are more than twice as fast with this change. Here are some inline-heavy benchmarks for comparison: This branch: parse inline-em-flat.dj x 24,362 ops/sec ±0.26% (95 runs sampled) parse lorem1.dj x 13,244 ops/sec ±0.29% (101 runs sampled) parse inline-em-nested.dj x 25,121 ops/sec ±0.23% (96 runs sampled) parse inline-em-worst.dj x 27,437 ops/sec ±0.66% (92 runs sampled) parse readme.dj x 2,529 ops/sec ±0.55% (96 runs sampled) Current main: parse inline-em-flat.dj x 25,057 ops/sec ±0.26% (99 runs sampled) parse lorem1.dj x 5,846 ops/sec ±0.26% (98 runs sampled) parse inline-em-nested.dj x 25,584 ops/sec ±0.26% (100 runs sampled) parse inline-em-worst.dj x 26,859 ops/sec ±0.29% (98 runs sampled) parse readme.dj x 1,144 ops/sec ±0.43% (96 runs sampled) Thanks to @matklad for identifying this as a bottleneck and suggesting the general direction of the improvement.
This yields a significant performance boost. See #13. Of course, the speedup depends on the input. This will mostly affect texts with significant strings of inlines (longer paragraphs rather than just a few words). But on those, we are more than twice as fast with this change. Here are some inline-heavy benchmarks for comparison: This branch: parse inline-em-flat.dj x 24,362 ops/sec ±0.26% (95 runs sampled) parse lorem1.dj x 13,244 ops/sec ±0.29% (101 runs sampled) parse inline-em-nested.dj x 25,121 ops/sec ±0.23% (96 runs sampled) parse inline-em-worst.dj x 27,437 ops/sec ±0.66% (92 runs sampled) parse readme.dj x 2,529 ops/sec ±0.55% (96 runs sampled) Current main: parse inline-em-flat.dj x 25,057 ops/sec ±0.26% (99 runs sampled) parse lorem1.dj x 5,846 ops/sec ±0.26% (98 runs sampled) parse inline-em-nested.dj x 25,584 ops/sec ±0.26% (100 runs sampled) parse inline-em-worst.dj x 26,859 ops/sec ±0.29% (98 runs sampled) parse readme.dj x 1,144 ops/sec ±0.43% (96 runs sampled) Thanks to @matklad for identifying this as a bottleneck and suggesting the general direction of the improvement.
Looked at the perf profile again this morning. Sadly, nothing more stands out: at this (function) level of granularity, the profile looks mostly flat, it's "stuff takes time" basically. I think we'll need some sort of line profiler to get further along, but quick googling didn't show how to get that with JS. Nonetheless:
|
I adjusted the commonmark.js profiling so it matches what we're doing (parsing only, constructing object inside the profiling loop), only with cm versions of our dj files. The differences are rather astounding, though I must say I don't understand why the observed differences on real-world conversions are not very large:
And here is djot.js:
|
PS. My memory of optimizing the commonmark.js parser was that it was largely a matter of doing things that made the v8 optimizer happy -- so, it wasn't like optimizing a regular program, where you look for hot spots (the sort of thing we've been doing), but more about doing magic tricks that suddenly made a huge difference. |
Tried replacing accumulatedText with a string, but no improvement, slight slowdown in some cases. |
I think your first two findings are the most promising. Something is megamorphic, and if we figure out what's triggering that it could have a big effect. Note that in commonmark.js, we just use a Node object that, essentially, has all the fields anything will need (
The second thing, avoiding things ilke |
OK, I managed to remove the need for the
|
In a real-world test with converted versions of the same document, I'm getting: djot.js: 2874 bytes/ms |
After a few more optimizations, I've improved it a bit more:
|
This allows us to splice things like `-emph` into the appropriate place in the match list, avoiding the need to sort the whole list in getMatches. This yields ~ 20% speed increase in inline-heavy text. See #13.
After 819f8f3 we're a bit faster:
|
After 1492366 :
|
After ddb155a
|
After 57c81ea
|
After upgrading to node v16, times are slower. I note this here so we don't think changes in the code have caused the slower performance:
|
Benchmarks are currently running about 3X slower than for commonmark.js. It would be good to understand why and narrow the gap.
The text was updated successfully, but these errors were encountered: