-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: how should we test the performance? #45
Comments
You might enjoy this video on Dave's Garage https://youtu.be/pSvSXBorw4A, where something similar was done. The issues I can see are the following.
However this is in the viewpoint of "sticking to the book" and making this more intro-friendly. I've personally been thinking it would be cool to see different approaches/features after we clear the initial book. Maybe that involves hyper optimizing for performance, maybe it involves outputting the AST as a PNG, or making a REPL, etc etc. Then going back and seeing what has been done. Would make for good content on Stream, and let some people flex their creative muscles. Then we can rank them based on how creative they are. Language Agnostic, which performance isn't. |
so i have some requirements for this to become a performance test, which if you watch any of the videos i have done thus far, that is going to be something i will do. so for this to become a performance test i want to create each of the steps into a server and serve out the tokens, what i may be able to do with the parser, and then finally run the program itself and return out the output of the program. To me this is a proper performance test. it tests the language in a more real way. sys calls, memory usage, and all of that fun instead of some toy example |
second as far as code goes. i want to have everything written in such a way that it is very simple. then it would be fun to talk about performance of each to see how to improve it. |
Hitting me with the Tuesday Callout D: Makes sense, I'm assuming you'll dig more into that on Stream, so that we can make sure PRs can properly maintain that? |
For the server approach, do you mean that each language will run whichever server framework the author chooses, or would we keep the languages as a CLI app, just wrap all with the same server? If we aren't trying to test what the best server framework is, I would suggest that we make it into a long running CLI app all called by the same server. So the server would launch the app and then basically send requests to it (over stdin/stdout) like you would a repl. That would still lead to GC and whatnot. It just wouldn't require dependency on server frameworks. |
Using the |
I hate to jump too far ahead, but I am interested in if we could have a better outline of how this will work. There are plenty of solutions that are going to be... interesting to benchmark to say the least. Bash/Google Sheets/ChatGPT are ones that come to mind instantly. I know JJDSL would need some changes to accept stuff during runtime. Not looking for the final method, but at least how we could expect the input to happen. Just going to list off the ones that come to mind.
Likewise there is the output, but I'm of the opinion that an output is easier to handle than an input, so I'm less worried there. |
Could probably do something like this, to include all the crummyness of the compilers/interpreters/jit/etc. Probably average over a bunch of runs or something? $ time `cat testcode.monkey | foo-lang-impl > foo-lang-test-results` |
It's probably important to run/build the tests in docker to avoid the nightmare of configuring the correct runtime environment for all the languages simultanously. To that end, I added the following to my docker-time: docker-build
docker run -i -v $(shell pwd):/deez deez_$(notdir $(shell pwd)) time ./bin/TsRustZigDeez so then I can just run it like this: $ cat test.monkey
let ackerman = fn(m,n) if (m == 0) n + 1 else if (n == 0) ackerman(m-1, 1) else ackerman(m-1, ackerman(m, n-1))
ackerman(3,8)
$ cat test.monkey | make docker-time
docker build . -t deez_cpp-spongman
[+] Building 7.3s (13/13) FINISHED
...
docker run -i -v /home/piersh/ts-rust-zig-deez/cpp-spongman:/deez deez_cpp-spongman time ./bin/TsRustZigDeez
repl
> nil
> 2045
>
real 0m 2.65s
user 0m 2.24s
sys 0m 0.39s |
so my personal thought on this is that for a language to be "a part of the test" we are going to make MLAAS POST /lex returns the JSONified tokens that way we can test these perfs as we go. i am going to be building a client using Turso (ad) and Drizzle (not an ad, seems like a neet orm to try out). That work will probably start today. Should be fun! |
Interesting, I assume you mean HTTP POST? Does that mean each language needs to implement an http server? What about the Assembler guy? Does he have to write an http server in asm? How to isolate the performance of the language implementation from the performance of the http server? |
I'm assuming here, but we'd probably disregard the HTTP timing, so you wouldn't need to implement the HTTP Server in your language, just would have to wrap your solution. For example (in Java/Spring) @PostMapping("/rest/lexer")
public Long timeLexer(@RequestBody String monkeyScript) {
long startTime = System.currentTimeMillis();
Lexer lexer = new Lexer(monkeyScript);
// Assume we parse all tokens.
return System.currentTimeMillis() - startTime;
} This would allow each instance to "disregard" most of the overhead of the HTTP, and only return the rough actual time cost. Main exception I can think of would be the ASM, who might have to deal with additional overhead in calling it, but they could probably just wrap it in some C/C++ and do it like this. |
I see an obvious optimization there ;-) |
I'll entertain you. //@PostMapping("/rest/lexer")
//public Long timeLexer(@RequestBody String monkeyScript) {
// long startTime = System.currentTimeMillis();
Lexer lexer = new Lexer(monkeyScript);
// Assume we parse all tokens.
// return System.currentTimeMillis() - startTime;
//} Since we're removing overhead we only care about |
IMO the interaction with the language implementation should just be via stdin/stdout (as this is what the book implements). this is the simplest thing that removes all other variables. if you want to wrap that in a standard web server that services requests and runs docker & pipes the request/results in/out of it, that's fine, but i'm not entirely sure what you're testing at that point. there's no need to implement the timer inside the interpreter code, |
If we're going to compare implementations, I only really care to see the time difference between the code itself, everything else feels like noise. |
yeah, google sheets & scratch are going to require some kind of wrapper whatever everyone else uses. stdin/stdout just seems like the baseline because that's what everyone (else) is already implementing. IMO startup time is a big factor. if the C runtime took 20 seconds to start, nobody would use it regardless of how efficient the compiled code was. |
Depends on the context. A long running server's startup time doesn't matter, since it's going to be on for a long time. If you have a client app/burst application, then it's going to matter more. |
3Days has some HTTP stuff so I think a web server in HolyC can be done. I do not know if it can be done without breaking compatibility on what it should be actually tested on, which is TempleOS itself |
Here on GitHub, we can do such a simple and free workflow — whenever someone pushes their changes into the
master
branch:And then those results could be downloaded at any given point of time for up to 90 days after the test — that'll mean ThePrimeagen can come up with an arbitrary point in time, download test results all at once and then parse them.
Now, the workflow can be different, but the question is: how do we test the speed of interpreters doing lexing, for example?
time
command?perf stat
?@ThePrimeagen
The text was updated successfully, but these errors were encountered: