Initial performance impressions & Slow string performance? #504

ewmailing · 2021-11-11T09:02:11Z

ewmailing
Nov 11, 2021

Hi, I'm not doing any serious benchmarking yet, but I have been spot checking here and there to see what Pallene's performance is like.

So for some context, I've been writing a little stock backtester. The basic idea is you load a bunch of historical stock data and you iterate through it. For each tick, you run some custom rules depending on what ideas you want to test. Usually you will be computing various technical indicators here (which usually involves running through arrays and doing some math operations), and making decisions (branches) to buy and sell.

I speculated that Pallene might do well in this type of use case, particularly the iterating through arrays and doing math. I was hoping that native execution, plus lower overhead (to help things like cache locality and prefetching) might yield good results in Pallene.

As a quick and dirty benchmark reference, I used the Python based Backtesting.py library and used a stripped down version of their hello world tutorial which makes a simple buy/sell decision based on Simple Moving Average (SMA) cross-overs. I then implemented from scratch my Pallene approximate equivalent.

I haven't done much Python development, so I immediately got first hand experience of how slow Python is. Wow. And in this hello world, the SMA arrays are pre-computed at init, so there is no serious math crunching done in the main loop. So this this isn't just an issue with Python big nums being slow, but many other things in Python being slow.

Anyway, I digress. To make measuring a little clearer, I put a for-loop of 1000 around the main loops of the programs to extend out the run times. This would help mitigate the effects of the slow loading times of Python.

So Python took 58 seconds.
Pallene took .3 seconds
Lua (using Pallene's --emit-lua) took .9 seconds.

So Pallene was about 3 times faster than pure Lua. And both were a magnitude faster than Python.

I then wrote a second math heavy experiment that was centered around options trading, which has a lot more number crunching (and uses heavy instructions like math.ln). Unfortunately, Backtesting.py doesn't have any options support, so I could not compare to Python. But Pallene was about 4x faster than Lua in this case.

So after that, I needed to implement some graphs for the results for my backtester so I could visualize the results. I am doing a fast and dirty thing, so I am generating HTML/JS graphs that can be opened in a web browser. So basically my code needs to take all my all my data in a program, and write out a web page with a JavaScript program that generates the desired graph. So there is a lot of string stuff here.

Since there are few string built-ins right now in Pallene, my code is pretty dumb and involves hard coded strings I wrote into my code concatenated with on-the-fly strings, using the .. operator.

What has surprised me about this performance is Pallene seems to be slower in this case than Lua (via --emit-lua). I did compile with -O2 and -O3 to make sure it wasn't just a compiler optimization problem, but Pallene was still slower. And I didn't investigate further, so I don't know if the bottleneck is the actual string code, or the Pallene io.write function.

In Pallene, it took 1.12 seconds.
In Lua, it took 0.27 seconds.
(I did not loop these 1000 times and this measurement was just the graph generation function, and not the entire program, whereas earlier, I was timing the whole program from launch to exit.)

Also I think Pallene's start up time may be a little slower, although I didn't do a clean measurement of start up times. But that start up time is negligible compared the string handling difference. I'm guessing this means loading the .so takes longer than loading/interpreting the emitted Lua script.

Anyway, I was wondering if this performance was known about, and I thought I would mention it if you didn't.
I'm thinking of re-writing/moving the graph generation code to Lua anyway because I think I want to use string.gsub to make it easier for me to insert code into my JavaScript template.

srijan-paul · 2021-11-12T04:08:18Z

srijan-paul
Nov 12, 2021
Collaborator

Interesting, could you share the benchmark scripts for the second one?

0 replies

hugomg · 2021-11-12T04:58:58Z

hugomg
Nov 12, 2021
Maintainer

As a quick and dirty benchmark reference, I used the Python based Backtesting.py library

Taking a quick look at the backtesting source code, it appears that it's using Pandas and Numpy under the hood. So it isn't pure Python solution. Their math is going to be pretty fast but the stuff around it is still going to be Python... That might make it more difficult to do a head-to-head comparison.

But Pallene was about 4x faster than Lua in this case.

Sounds nice! Maybe there is still room for improvement but 4x is not bad.

What has surprised me about this performance is Pallene seems to be slower in this case than Lua (via --emit-lua). I did compile with -O2 and -O3 to make sure it wasn't just a compiler optimization problem, but Pallene was still slower. And I didn't investigate further, so I
don't know if the bottleneck is the actual string code, or the Pallene io.write function.
Anyway, I was wondering if this performance was known about, and I thought I would mention it if you didn't.

In theory, in the worst case Pallene should only be as slow as Lua, not slower than it. Thanks for finding this. If this is possible to reproduce, we should certainly file a bug report!

If there really is a problem, Pallene's custom string builtins and io.write are candidates for being the culprit. However, before jumping to that conclusion we should probably profile it... Does anyone remember what are the linux incantations for that? I think it might be possible with perf record.

Also I think Pallene's start up time may be a little slower, although I didn't do a clean measurement of start up times. But that start up time is negligible compared the string handling difference. I'

I wouldn't expect startup times to be a big deal. Pallene isn't a JIT compiler that does at lot of work at startup. If the total time is more than one second then the startup time should be negligible.

1 reply

ewmailing Nov 13, 2021
Author

I just opened an new issue based on this thread, and I attached a stripped down program that reproduces this problem with the strings. Going through it, I remembered that I did the file writing on the Lua side, so io.write is probably not the issue.

A digression, I noticed that the generated .so for my real program has grown to 1.4 MB for about 4000 lines of code. I believe this is the reason I started noticing the slower load times for the Pallene compiled path. As the .so got bigger, it started slowing down the dlopen launch time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial performance impressions & Slow string performance? #504

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Initial performance impressions & Slow string performance? #504

ewmailing Nov 11, 2021

Replies: 2 comments · 1 reply

srijan-paul Nov 12, 2021 Collaborator

hugomg Nov 12, 2021 Maintainer

ewmailing Nov 13, 2021 Author

ewmailing
Nov 11, 2021

Replies: 2 comments 1 reply

srijan-paul
Nov 12, 2021
Collaborator

hugomg
Nov 12, 2021
Maintainer

ewmailing Nov 13, 2021
Author