-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ideas for improvement [RFC] #625
Comments
all good ideas :-) One slightly more complicated: to make things more searchable, can we make edges between tests ? E.g. a reschedule could result in an new field 'rescheduled as: ....' and 'rescheduled from: ...' in the detailed info of a test. In this way one could easily find the STC result of an LTC tests or the other way around. Alternatively one could search on the git SHA which would provide similar functionality. |
@vondele that makes sense. maybe adding a search would generally be useful. once we have enough ideas for what search should be able to do, we can work towards making it happen |
yes, we can start with a Should we add search ideas to the open search issue? |
yea these are just possible layouts. i think sure, adding search ideas to the search issue sounds good: #339 |
|
|
|
@linrock I agree with the need for unit tests in the stats code to catch bugs when updating code. The reason that there aren't any is that I do not know how unit testing works in fishtest. This being said: unit tests by themselves are very far from guaranteeing correctness. They satisfy the GIGO principle. But: with regard to correctness the situation is not so bad as you make in sound. (1) First of all, all stats code can be validated by simulation. There are two simulators available (a) https://github.com/vdbergh/simul is written in C and determines the pass probabilities by simulation. It is fast enough to determine pass probabilities with 4 decimal digits. With the current implementation the error probabilities are 0.0500 even with quite large batch sizes. Both pass probabilities and average running times agree perfectly with the predictions of the SPRT calculator https://tests.stockfishchess.org/html/SPRTcalculator.html (b) https://github.com/vdbergh/pentanomial is written in Python and also validates the Elo estimates. It has become very slow however. (2) Secondly I have taken care to document all the math that is used in the stats code. See here EDIT: Perhaps I should add that the raw stats page for each test shows a lot of additiional information on intermediate quantities used by the stats code. This serves as additional audit information. In particular the crucial LLR quantity is calculated in 7 different ways (3 pentanomial, 4 trinomial). This also provides an important sanity check. |
I wholly support this idea, I think it would be very useful! One other related feature I would request is a "My LTCs" page. We actually already have this functionality for the most part, although the filter is only applied to finished tests and not pending ones, see: |
@31m059 cool, i opened a PR to add the "my tests" link. i can fix that LTC issue later. the "my tests" page can become more useful over time if it were filterable or searchable. @vdbergh that sounds good. pretty important to have a sanity check on stats since it's foundational to progress. if you have any particular numbers or data you'd want to be tested, feel free to post them and i can convert them into unit tests. |
@linrock If you just point me to some other unit tests in fishtest I can copy the model. I am afraid there is not much more that can be done than to check the output of some of the basic stats functions for some examples. I guess one could also simulate a few tests with a predetermined seed and check that the outcome and duration (for SPRT) of the test is as previously recorded. Needless to say that none of this gives any guarantees for the actual correctness of the stats code. Establishing this requires simulating many millions of tests. But it would certainly be useful as a quick sanity check against changes with unintended consequences. |
|
|
|
|
|
IMO better to switch to an archive for the whole test, served by nginx. With 2k workers running daily the free disk space is very low (11GB), the pgns collection is inflated to 75GB :( |
moving the pgns out of the database makes sense, IMO. Would it also make sense to limit the amount of stored pgns not by time, but by volume (i.e. instead of a purge every 7days, or whatever we use now), purge oldest as soon as the collection is > XX GB. Having them separated by task (i.e. limited to 250 games per file currently), is sometimes convenient. |
STC pgns purge (should be 2 days) stopped working 2 weeks ago, I opened #642. |
Consider converting pgn database into a capped collection? |
@noobpwnftw with a single capped pgns collection in case of problems we could lose most of LTC PGNs (the older data) |
Currently a worker update is a bit cumbersome since if I understand correctly @noobpwnftw has to do it manually (too many downloads otherwise from Github). I wonder if the updating could happen through the Fishtest server. Using an api call the server could download a new worker from github once and then cache it. |
Only 12 KB for the zipped worker files (skipping the |
Don't reset base branch/base signature when switching to NumGames. I assume the feature is there for when one clicks SPSA, as SPSA tests should have identical base and test branches, but for NumGames it just wastes time if base was filled before selecting NumGames. |
@Alayan-stk-2 should be fixed now. |
Suggestion: allow the submitter of a test to modify (using the Modify button) the notes of a running test. |
Implements vondele's suggestion here: official-stockfish#625 (comment)
Implements vondele's suggestion here: official-stockfish#625 (comment)
Implements vondele's suggestion here: #625 (comment)
One idea is: on pages of individual tests, be able to click user's username to go to the page containing all their tests |
@silversolver1 already implemented :) |
My mistake, I should have been more specific. I mean on the page of an actual test where a user can access the stop/purge, reschedule or modify buttons, next to where the value for itp is listed, there is the word "username". here, the user's particular name does not currently link to that user's /tests/user/ page, which is the feature I am suggesting |
@ppigazzini oops, can this be re-opened? didn't expect that linking to a comment in this issue from that PR would close it. still a bunch of unresolved suggestions in this issue |
This issue is for misc ideas for improving the site. Feel free to suggest anything, big or small, or brainstorm ideas here.
Some ideas for starters:
The text was updated successfully, but these errors were encountered: