-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add postgrest-loadtest
based on vegeta
#1812
Conversation
Awesome!
Yes, absolutely! |
I'd love to have those commands available :D Amazing work Wolfgang 💯 🥇
I like how simple the HTTP test is. One thing though, would it be possible to add a random id(as a query param) that changes on every request? Similar to what I do here. This would be useful for future load tests we might add. |
I guess this would be possible, but would require a bit of overhead around it. Not as simple anymore, for sure. What advantage would you expect when we did randomize the id? |
I added a few tests regarding headers, because I wanted to know more about #1794 (comment). |
Fairer tests I thought. Might not be that relevant on GET, because during my load tests on cloud, I noticed the same req/s with both same id vs random ids. Testing PATCH on different ids might make a difference in results(though maybe it's more of a postgresql concern, see here). |
Hm. I was about to argue that compared to the "whole-system-k6-tests" you're running, where PostgreSQL and it's performance is very important, we don't need to care about PostgreSQL in So we should probably draw the line somewhere around the fact, where the performance problem can be fixed:
Adding randomness also potentially makes test runs less comparable, if we don't counter that with longer runtimes to balance out the random part. We need to keep in mind that the more tests we run with the loadtest, the longer we need to run the test to get proper results per test-case. I don't think we need to be super careful, but we shouldn't add every possible test-case either. The goal should be to have tests in place for all major paths through PostgREST - and then some additional tests for specific issues we've encountered and fixed to avoid regressions. WDYT? |
Very true. Let's keep random ids out until they're really needed.
Yes, Fully agree. |
Last night I felt brave and tried running Anyway the result was... surprising. v7.0.1 seemed to be muuch faster than current HEAD. I wanted to see what was going on and wrote a small script to run Here's the data
And here's how that looks: The commits with the major changes are (first number is position on x-axis):
The first 2 are major improvements, but we knew about those before. The two The coverage-one is not critical. This drop in performance is not real (for production). That's just the effect of changing cabal's build flags to But.. the last one is really surprising. I have no idea why the addition of configuration parameters from the database slows things down so much. This is something we need to investigate. @steve-chavez, could you do me a favor and test the following commits vs. the commit immediately before with your k6-setup?
Another observation: The blue numbers (1s) are a bit jumpy, I feel. But there is not too big of a difference between orange and green it seems. So it should be enough to run each test-case for 5s and not 25s. |
:O! Nice plotting work Wolfgang.
Sure! I'll create static binaries for those commits and report here. Meanwhile, I've ran GETSingle on the latest nightly(d3a8b5f) — which already contains the coverage and in-db config changes — and v7.0.1. Nightly(
v7.0.1(
Seems there's no regression. |
|
I have done a few further tests and checks:
|
@steve-chavez In your setup: Which parts of the setup (postgrest, database, workers) are on the same machine? All three? Or are any of them going through a real network? |
postgrest + database are on the same machine(a t3a.nano, low-end instance) the k6 runner is in another machine(t3a.medium). The two machines are under the same VPC(AWS's VLAN). So yep, the requests go through network. |
Figured it out:
Just by removing those two lines, I am up at 5200 hypo/s again (including the disabled dev flags). Soo.... we should either disable |
Aha! Disabling the prepared statements should have the most impact. Btw, regarding Edit: Added the 4 tests on #1812 (comment). |
Yes, enabling prepared statements gave me the biggest boost. That 11% drop is probably that extra round-trip on every request then.
Thanks. I'm glad that we're still running fast :) |
@monacoremo could you have a look at my latest commit 2b71f53? I hope that will fix the "missing GHC" error I got in CI. What do you think about the way I did that - any nix-anti-pattern or something that I hit? ;) |
2b71f53
to
e38d88a
Compare
@monacoremo another question for, maybe you already have an idea. I am now at a stage where I want to parse not only arguments, but options to those shell scripts (loadtest and changelog stuff, too). I thought I'd get my feet wet with the "do something with |
A few runs of First run: Running loadtest on "main"...
Requests [total, rate, throughput] 5698, 1139.48, 1139.25
Duration [total, attack, wait] 5.002s, 5.001s, 1.011ms
Latencies [min, mean, 50, 90, 95, 99, max] 640.089µs, 873.787µs, 802.365µs, 1.131ms, 1.236ms, 1.513ms, 3.974ms
Bytes In [total, mean] 239316, 42.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:5698
Error Set:
Running loadtest on HEAD...
Requests [total, rate, throughput] 5549, 1109.72, 1109.38
Duration [total, attack, wait] 5.002s, 5s, 1.562ms
Latencies [min, mean, 50, 90, 95, 99, max] 592.531µs, 897.615µs, 827.983µs, 1.172ms, 1.265ms, 1.504ms, 4.553ms
Bytes In [total, mean] 233058, 42.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:5549
Error Set: Second run: Running loadtest on "main"...
Requests [total, rate, throughput] 5269, 1053.77, 1053.52
Duration [total, attack, wait] 5.001s, 5s, 1.174ms
Latencies [min, mean, 50, 90, 95, 99, max] 681.89µs, 945.223µs, 867.14µs, 1.232ms, 1.356ms, 1.679ms, 10.254ms
Bytes In [total, mean] 221298, 42.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:5269
Error Set:
Running loadtest on HEAD...
Requests [total, rate, throughput] 5934, 1186.62, 1186.39
Duration [total, attack, wait] 5.002s, 5.001s, 980.151µs
Latencies [min, mean, 50, 90, 95, 99, max] 653.65µs, 839.168µs, 772.683µs, 1.074ms, 1.176ms, 1.42ms, 2.3ms
Bytes In [total, mean] 249228, 42.00
Bytes Out [total, mean] 0, 0.00
Success [ratio] 100.00%
Status Codes [code:count] 200:5934
Error Set: Third run (after rebase). Now with cabal dev flag (before was without):
Fourth run, with different request types. No report-per-test, yet:
Fifth run, different request types, 60s:
First run on GitHub Actions:
Seems like the machines here are a bit faster. |
I have just found https://argbash.io. Looks like we could integrate that into our Edit: https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/tools/misc/argbash/default.nix ... yes! |
Looks super promising! That would bring all our scripts to the next level! I'm only familiar with |
06d2163
to
3b9debb
Compare
To be able to run the loadtest consistently against different commits, we need to not only run the same requests, but also run them on the same schema. Assuming some PR makes a change and adds a loadtest to show the result - the same test, including fixtures, needs to be run against the main branch. To avoid a complicated procedure of needing to commit tests/fixtures first, etc. I was looking for a way to tell To do this, I wanted to add CLI option to
@monacoremo WDYT? Now, I added a very small schema for the loadtest and also more requests to test. My first goal was to match all request handlers in |
Yes, I see how that would be nice - especially with the argbash integration now. I was thinking whether there could be a compromise - e.g. generating and checking in a with_tmp_db based on the nix version. Should be doable with a bit of sed (changing the shebang, removing nix-store refences, embedding the arg parsing, adding a comment that the file is generated etc) and we could check the consistency in CI as we do with Or we just maintain two versions of the script, it's been relatively stable... We should keep some version of the standalone script in any case I think, otherwise we seriously gimp any non-nix developers. WDYT? |
I don't think there is a reason to do that. I can see the reasoning for allowing to build with stack. But not without nix. All our dev environment is based on nix + we have the docker-compose setup for cases where that doesn't work natively. To be honest: The biggest hurdle for me going up to speed initially was the fact that I was not using nix. Once I did, everything was so much smoother. Let's turn this around and create a |
e69f3fe
to
8d10670
Compare
8d10670
to
3ba6c17
Compare
I rebased this after the move to GitHub Actions. Right now, https://github.com/PostgREST/postgrest/runs/4228318238?check_suite_focus=true Makes all the nix based jobs fail randomly. The load test job is supposed to upload a markdown table to the checks page of the PR. This is not working right now, because PRs from a forked repo only have read access through |
8424d21
to
8ce34ba
Compare
18d4373
to
49093e0
Compare
So the loadtest report seems to be uploaded successfully, according to the workflow: Unfortunately I have no idea where it ended up. Can't find it anywhere, yet. Will have to investigate later.. |
Something turned up here! https://github.com/PostgREST/postgrest/runs/4317691924?check_suite_focus=true Need to check whether that went to the correct check run, but seems to work great otherwise. Very stable results also, awesome work! |
49093e0
to
e444630
Compare
Ah nice find. The workflow that runs on Let's see what happens in a bit. |
Ok, so that worked fine now, it shows up here. Right now the results are only added with a neutral conclusion. There is no comparison against a threshold, yet, and the pipeline would not fail because of a drop in performance. We can improve the formatting of the results later on, too - and also write a more sophisticated python script to analyze the numbers in detail, e.g. separated by request. Right now we just a have one set of numbers for all kinds of requests mixed together. My idea would be to merge this now, and then let this run a bit on all pull requests and pushes to main. We can then get a feel for the numbers and see what kind of threshold we would like to implement. WDYT? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work! Great integration with our CI, will be very useful to have
e444630
to
0428c79
Compare
This is my take on #1609. As discussed in that PR, I ran a few tests with vegeta.
Compared to locust, as initially tested in #1609, I identified the following advantages:
I have only written one very basic test, so far. We are not doing too much with the results either, yet. However, I have added a nice framework around the test runner. We have:
postgrest-loadtest
to run the loadtests in the current working treepostgrest-loadtest-against <commit-ish>
to run the loadtests twice for A/B testing. Once on the current working tree and once on the branch that is given by the first argument. It defaults to main, that meanspostgrest-loadtest-against
will give you a test run comparing your current branch against main. This, with some fancy number crunching of results, could be a great tool for CI.git worktree
(quite awesome!) and implementedpostgrest-with-git <commit-ish> <sub-command>
. This will checkout commit-ish in a temporary directory and run the sub-command in it. This not only allows to compare loadtests on different branches, but also things like:postgrest-with-git remove-setup-hs nix-shell
. This started a new nix-shell on that branch. Made the change, pushed,exit
, done. Ok, that's really what you would do withgit worktree
anyway - but I didn't know, so that was quite an awesome thing to do :D.postgrest-with-git remo/untangle-the-world postgrest-test-spec-all
will answer that.postgrest-with-git that-other-branch postgrest-my-new-tool
has you covered, because even though it runs in the other branch, it still takes thepostgrest-
command from the current instance of nix-shell. Of course, you can do by just runningnix-shell
on one branch and then checking out the other branch. Once you have uncommitted changes, it becomes a bit complicated with possible merge conflicts, though...postgrest-with-git
has you covered!After writing this down, I am not sure anymore what the main feature of this PR was... ;)
All those tools are available behind
nix-shell --arg loadtest true
right now - although the only dependency is now vegeta. So we should just enable it by default, I guess?There are a few smaller todos in code, but the bigger ones are (mostly taken from #1609):
Any feedback more than welcome!