Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: how should we test the performance? #45

Open
novusnota opened this issue May 26, 2023 · 21 comments
Open

Discussion: how should we test the performance? #45

novusnota opened this issue May 26, 2023 · 21 comments

Comments

@novusnota
Copy link
Contributor

novusnota commented May 26, 2023

Here on GitHub, we can do such a simple and free workflow — whenever someone pushes their changes into the master branch:

  1. There would start a GitHub Actions workflow specifically for those languages, which files have changed with that push
  2. It would run a series of performance tests, say, for the lexer of Rust (if there was an update for Rust)
  3. Upon completion, the workflow would upload results as artifacts

And then those results could be downloaded at any given point of time for up to 90 days after the test — that'll mean ThePrimeagen can come up with an arbitrary point in time, download test results all at once and then parse them.

Now, the workflow can be different, but the question is: how do we test the speed of interpreters doing lexing, for example?

  • By simple time command?
  • Using perf stat?
  • Using Apache Bench plus some proxy which would feed tests into them? (to get some GC going on)
  • ???

@ThePrimeagen

@novusnota novusnota changed the title Discussion: speed tests (say, for lexers) Discussion: how should we test the performance? May 27, 2023
@Crain-32
Copy link
Contributor

You might enjoy this video on Dave's Garage https://youtu.be/pSvSXBorw4A, where something similar was done. The issues I can see are the following.

  • This promotes "bad practice", in seeking optimization instead of readability. I'm not saying that the two are mutually exclusive, but I think at the core we're mostly trying to create this Interpreter in line with the book.

  • As you mentioned, measuring this is hard, do you rank in terms of Memory? In terms of Raw Speed? JIT languages are going to be cold, so do you "warm up the JIT" for the Measurements?

However this is in the viewpoint of "sticking to the book" and making this more intro-friendly.

I've personally been thinking it would be cool to see different approaches/features after we clear the initial book. Maybe that involves hyper optimizing for performance, maybe it involves outputting the AST as a PNG, or making a REPL, etc etc. Then going back and seeing what has been done. Would make for good content on Stream, and let some people flex their creative muscles.

Then we can rank them based on how creative they are. Language Agnostic, which performance isn't.

@ThePrimeagen
Copy link
Owner

so i have some requirements for this to become a performance test, which if you watch any of the videos i have done thus far, that is going to be something i will do.

so for this to become a performance test i want to create each of the steps into a server and serve out the tokens, what i may be able to do with the parser, and then finally run the program itself and return out the output of the program. To me this is a proper performance test. it tests the language in a more real way. sys calls, memory usage, and all of that fun instead of some toy example

@ThePrimeagen
Copy link
Owner

second as far as code goes. i want to have everything written in such a way that it is very simple. then it would be fun to talk about performance of each to see how to improve it.

@Crain-32
Copy link
Contributor

Crain-32 commented May 31, 2023

which if you watch any of the videos i have done thus far

Hitting me with the Tuesday Callout D:

Makes sense, I'm assuming you'll dig more into that on Stream, so that we can make sure PRs can properly maintain that?

@bhansconnect
Copy link
Contributor

bhansconnect commented May 31, 2023

For the server approach, do you mean that each language will run whichever server framework the author chooses, or would we keep the languages as a CLI app, just wrap all with the same server?

If we aren't trying to test what the best server framework is, I would suggest that we make it into a long running CLI app all called by the same server. So the server would launch the app and then basically send requests to it (over stdin/stdout) like you would a repl. That would still lead to GC and whatnot. It just wouldn't require dependency on server frameworks.

@novusnota
Copy link
Contributor Author

I think he means something similar to Go vs. Rust vs. TypeScript repo and video. But yeah, the details should really be discussed later, when parsers would be implemented for all languages in the repo

@xslendix
Copy link
Contributor

xslendix commented Jun 2, 2023

Using the rdtsc x86 instruction to count CPU cycles at the execution of the program (not including compile time, where applicable) and again at the end. end - start and you have CPU cycles you can compare, repeat the previous steps a few more times per program and take the average and boom, job done.

@Crain-32
Copy link
Contributor

Crain-32 commented Jun 5, 2023

I hate to jump too far ahead, but I am interested in if we could have a better outline of how this will work.

There are plenty of solutions that are going to be... interesting to benchmark to say the least. Bash/Google Sheets/ChatGPT are ones that come to mind instantly. I know JJDSL would need some changes to accept stuff during runtime. Not looking for the final method, but at least how we could expect the input to happen. Just going to list off the ones that come to mind.

  • stdin
    • stdin + File Reference (Might be useful if we want programs to parse large files)
  • HTTP
  • Socket Protocol of some kind

Likewise there is the output, but I'm of the opinion that an output is easier to handle than an input, so I'm less worried there.

@codebrainz
Copy link
Contributor

Could probably do something like this, to include all the crummyness of the compilers/interpreters/jit/etc. Probably average over a bunch of runs or something?

$ time `cat testcode.monkey | foo-lang-impl > foo-lang-test-results`

@Spongman
Copy link
Contributor

Spongman commented Jun 7, 2023

It's probably important to run/build the tests in docker to avoid the nightmare of configuring the correct runtime environment for all the languages simultanously.

To that end, I added the following to my Makefile:

docker-time: docker-build
	docker run -i -v $(shell pwd):/deez deez_$(notdir $(shell pwd)) time ./bin/TsRustZigDeez

so then I can just run it like this:

$ cat test.monkey
let ackerman = fn(m,n) if (m == 0) n + 1 else if (n == 0) ackerman(m-1, 1) else ackerman(m-1, ackerman(m, n-1))
ackerman(3,8)

$ cat test.monkey | make docker-time
docker build . -t deez_cpp-spongman
[+] Building 7.3s (13/13) FINISHED                                                                                                                                                                              
...
docker run -i -v /home/piersh/ts-rust-zig-deez/cpp-spongman:/deez deez_cpp-spongman time ./bin/TsRustZigDeez
repl
> nil
> 2045
> 
real  0m 2.65s
user    0m 2.24s
sys     0m 0.39s

@ThePrimeagen
Copy link
Owner

so my personal thought on this is that for a language to be "a part of the test" we are going to make MLAAS

POST
/lex
/parse
/exec

/lex returns the JSONified tokens
/parse (unsure yet)
/exec the programs output

that way we can test these perfs as we go. i am going to be building a client using Turso (ad) and Drizzle (not an ad, seems like a neet orm to try out). That work will probably start today. Should be fun!

@Spongman
Copy link
Contributor

Spongman commented Jun 7, 2023

Interesting, I assume you mean HTTP POST? Does that mean each language needs to implement an http server? What about the Assembler guy? Does he have to write an http server in asm? How to isolate the performance of the language implementation from the performance of the http server?

@Crain-32
Copy link
Contributor

Crain-32 commented Jun 7, 2023

I'm assuming here, but we'd probably disregard the HTTP timing, so you wouldn't need to implement the HTTP Server in your language, just would have to wrap your solution. For example (in Java/Spring)

@PostMapping("/rest/lexer")
public Long timeLexer(@RequestBody String monkeyScript) {
   long startTime = System.currentTimeMillis();
   Lexer lexer = new Lexer(monkeyScript);
   // Assume we parse all tokens.
   return System.currentTimeMillis() - startTime;
}

This would allow each instance to "disregard" most of the overhead of the HTTP, and only return the rough actual time cost.

Main exception I can think of would be the ASM, who might have to deal with additional overhead in calling it, but they could probably just wrap it in some C/C++ and do it like this.

@Spongman
Copy link
Contributor

Spongman commented Jun 7, 2023

I see an obvious optimization there ;-)

@Crain-32
Copy link
Contributor

Crain-32 commented Jun 7, 2023

I'll entertain you.

//@PostMapping("/rest/lexer")
//public Long timeLexer(@RequestBody String monkeyScript) {
//   long startTime = System.currentTimeMillis();
   Lexer lexer = new Lexer(monkeyScript);
   // Assume we parse all tokens.
//   return System.currentTimeMillis() - startTime;
//}

Since we're removing overhead we only care about new Lexer(string), which doesn't need to be optimized as assuming we use ZGC, Object Churn isn't an issue. If your obvious Optimization is to not use Java, bad joke.

@Spongman
Copy link
Contributor

Spongman commented Jun 7, 2023

IMO the interaction with the language implementation should just be via stdin/stdout (as this is what the book implements). this is the simplest thing that removes all other variables. if you want to wrap that in a standard web server that services requests and runs docker & pipes the request/results in/out of it, that's fine, but i'm not entirely sure what you're testing at that point. there's no need to implement the timer inside the interpreter code, time ./bin/xyz is sufficient to test startup & execution.

@Crain-32
Copy link
Contributor

Crain-32 commented Jun 7, 2023

stdin/stdout would work, but we have implementations in stuff like Google Sheets and Scratch, you could make an argument that we don't have a need to test those, or we could wrap them in something that can take stdin/stdout. But now you are also comparing the time the wrapper takes. Or in the case of a runtime language, I don't want to time the startup cost of a language.

If we're going to compare implementations, I only really care to see the time difference between the code itself, everything else feels like noise.

@Spongman
Copy link
Contributor

Spongman commented Jun 7, 2023

yeah, google sheets & scratch are going to require some kind of wrapper whatever everyone else uses. stdin/stdout just seems like the baseline because that's what everyone (else) is already implementing.

IMO startup time is a big factor. if the C runtime took 20 seconds to start, nobody would use it regardless of how efficient the compiled code was.

@Crain-32
Copy link
Contributor

Crain-32 commented Jun 7, 2023

Depends on the context. A long running server's startup time doesn't matter, since it's going to be on for a long time. If you have a client app/burst application, then it's going to matter more.
Maybe we measure both?

@Spongman
Copy link
Contributor

Spongman commented Jun 7, 2023

yeah, take your pick, single-request or long-running:

  • cgi
  • fastcgi
  • microservice
  • lambda

image

@xslendix
Copy link
Contributor

3Days has some HTTP stuff so I think a web server in HolyC can be done. I do not know if it can be done without breaking compatibility on what it should be actually tested on, which is TempleOS itself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants