Ideas for improvement [RFC] #625

linrock · 2020-04-22T17:30:25Z

This issue is for misc ideas for improving the site. Feel free to suggest anything, big or small, or brainstorm ideas here.

Some ideas for starters:

Real-time updates on the homepage. I find I sometimes refresh the homepage a lot when checking up on the status of tests. Would be nice to have the option of seeing changes show up in real-time without manually refreshing the page.
Link to "My tests" in the sidebar. If you're on a tests_view page, there's no direct way to get back to the list of your tests.
Show the status of a test prominently on tests_view. There's no indicator that a test has finished other than the "Purge" button on the right. If a test is running, there's a "Stop" button there. It'd be nice to not have to think, and see right away the status of a test.
Unit tests for statistical calculations. The only guarantee that these stats are correct is the fact that Stockfish's ELO goes up with patches from green LTC tests. We have no other way of guaranteeing that the stats calculations are correct afaik.
Save entered fields on tests_run if the test submission fails. Right now if a test submission fails, ALL fields are cleared and you have to start over. Better would be if all fields were left as is.
Prevent submitting tests_run form if fields are obviously missing or invalid. Catch mistakes right away instead of letting obviously invalid data get sent to the server.

vondele · 2020-04-22T18:24:47Z

all good ideas :-)

One slightly more complicated: to make things more searchable, can we make edges between tests ? E.g. a reschedule could result in an new field 'rescheduled as: ....' and 'rescheduled from: ...' in the detailed info of a test. In this way one could easily find the STC result of an LTC tests or the other way around. Alternatively one could search on the git SHA which would provide similar functionality.

linrock · 2020-04-22T19:47:28Z

@vondele that makes sense. maybe adding a rescheduled from row to start with?

search would generally be useful. once we have enough ideas for what search should be able to do, we can work towards making it happen

vondele · 2020-04-22T19:59:56Z

yes, we can start with a rescheduled from field only (that can be empty of course). The examples are possible layouts? I have no strong preference.

Should we add search ideas to the open search issue?

linrock · 2020-04-22T20:02:52Z

yea these are just possible layouts. i think rescheduled_from could show up under the test run id if you're looking at a rescheduled test. otherwise, it would not be there.

sure, adding search ideas to the search issue sounds good: #339

ppigazzini · 2020-04-22T20:46:12Z

Show "Master diff" link only when base branch differs from Stockfish master (eg gray out "Master diff")

vondele · 2020-04-23T09:13:32Z

the list of LTC tests doesn't show the SMP LTC tests. Presumably caused by test on TC only, it could use TC * threads > 30 or so.
Maybe, on the main page the list of active tests could have buttons to show all LTC .... not sure if this clutters the UI too much, but I find myself looking through the list for running LTC test more often. Arguably, this could be also addressed by an eventual search option Add a system of filters to find tests #339

linrock · 2020-04-24T05:12:07Z

sparklines for params in SPSA tests. hover over a param to see its trend over time. maybe helpful to see if a param is converging or if it's noisy.
show unfinished tasks first on tests_view pages. so if a run is stuck at almost-finished, you can quickly get a sense of how much longer until the last task finishes
don't allow submitting test run forms that are obviously invalid (missing SPSA params, missing test repo). highlight the field that's invalid. this is quick to check client-side and prevents invalid requests reaching the server
if a test is still running, show how long it's been running. if it's finished, show that it's finished

vdbergh · 2020-04-24T05:38:29Z

@linrock I agree with the need for unit tests in the stats code to catch bugs when updating code. The reason that there aren't any is that I do not know how unit testing works in fishtest.

This being said: unit tests by themselves are very far from guaranteeing correctness. They satisfy the GIGO principle.

But: with regard to correctness the situation is not so bad as you make in sound.

(1) First of all, all stats code can be validated by simulation. There are two simulators available

(a) https://github.com/vdbergh/simul is written in C and determines the pass probabilities by simulation. It is fast enough to determine pass probabilities with 4 decimal digits. With the current implementation the error probabilities are 0.0500 even with quite large batch sizes. Both pass probabilities and average running times agree perfectly with the predictions of the SPRT calculator

https://tests.stockfishchess.org/html/SPRTcalculator.html

(b) https://github.com/vdbergh/pentanomial is written in Python and also validates the Elo estimates. It has become very slow however.

(2) Secondly I have taken care to document all the math that is used in the stats code. See here
http://hardy.uhasselt.be/Fishtest/ .

EDIT: Perhaps I should add that the raw stats page for each test shows a lot of additiional information on intermediate quantities used by the stats code. This serves as additional audit information. In particular the crucial LLR quantity is calculated in 7 different ways (3 pentanomial, 4 trinomial). This also provides an important sanity check.

31m059 · 2020-04-24T20:38:16Z

Link to "My tests" in the sidebar. If you're on a tests_view page, there's no direct way to get back to the list of your tests.

I wholly support this idea, I think it would be very useful!

One other related feature I would request is a "My LTCs" page. We actually already have this functionality for the most part, although the filter is only applied to finished tests and not pending ones, see:
https://tests.stockfishchess.org/tests/user/31m059?ltc_only=1
So a narrowly-modified version of this could also be added to the sidebar.

linrock · 2020-04-25T02:20:43Z

@31m059 cool, i opened a PR to add the "my tests" link. i can fix that LTC issue later. the "my tests" page can become more useful over time if it were filterable or searchable.

@vdbergh that sounds good. pretty important to have a sanity check on stats since it's foundational to progress. if you have any particular numbers or data you'd want to be tested, feel free to post them and i can convert them into unit tests.

vdbergh · 2020-04-25T07:45:30Z

@linrock If you just point me to some other unit tests in fishtest I can copy the model.

I am afraid there is not much more that can be done than to check the output of some of the basic stats functions for some examples. I guess one could also simulate a few tests with a predetermined seed and check that the outcome and duration (for SPRT) of the test is as previously recorded.

Needless to say that none of this gives any guarantees for the actual correctness of the stats code. Establishing this requires simulating many millions of tests. But it would certainly be useful as a quick sanity check against changes with unintended consequences.

vondele · 2020-04-25T13:04:21Z

on the individual test page, allow to set 'Auto-purge on/off' to a different state with 'modify' now it can only be set on test submission.

vondele · 2020-04-25T14:29:07Z

if it is technically feasible, auto-increasing the maxgames for SPRT tests would be perfectly fine with me. One could start with a a small amount of maxgames set, and whenever the the number of already allocated tasks gets large, increase it by a factor 1.2 or so.

vondele · 2020-04-25T14:34:53Z

A little more controversial probably. Auto-reschedule passed SPRT STC tests as the corresponding LTC test, as soon as it has passed. Should probably be controlled with a 'Auto-reschedule on/off (default on)' tick box on submission and modifiable on the test page. Furthermore, auto-reschedule should only happen if the base is master, and still up-to-date.

vondele · 2020-04-25T20:02:21Z

manual rescheduling should be possible even if the base bench doesn't match master. Probably there should be an indication that the master branch is outdated, but it shouldn't be a hard stop.

vondele · 2020-04-26T08:47:14Z

right now, the pgn of finished tasks can be downloaded for a while by clicking on the task Idx. That's hard to find, unless one knows. Also, to download the all games of a test one needs to do some scripting or click >100 times. A visible single-click option to download an archive containing all games of a test (i.e. all tasks) would probably be appreciated by some.

ppigazzini · 2020-04-26T09:00:47Z

right now, the pgn of finished tasks can be downloaded for a while by clicking on the task Idx. That's hard to find, unless one knows. Also, to download the all games of a test one needs to do some scripting or click >100 times. A visible single-click option to download an archive containing all games of a test (i.e. all tasks) would probably be appreciated by some.

IMO better to switch to an archive for the whole test, served by nginx. With 2k workers running daily the free disk space is very low (11GB), the pgns collection is inflated to 75GB :(

vondele · 2020-04-26T09:05:23Z

IMO better to switch to an archive for the whole test, served by nginx. With 2k workers running daily the free disk space is very low (11GB), the pgns collection is inflated to 75GB :(

moving the pgns out of the database makes sense, IMO. Would it also make sense to limit the amount of stored pgns not by time, but by volume (i.e. instead of a purge every 7days, or whatever we use now), purge oldest as soon as the collection is > XX GB. Having them separated by task (i.e. limited to 250 games per file currently), is sometimes convenient.

ppigazzini · 2020-04-26T09:55:05Z

STC pgns purge (should be 2 days) stopped working 2 weeks ago, I opened #642.
We should remove the link if the pgns is missing, this will help to catch a bug.

noobpwnftw · 2020-04-26T10:10:16Z

Consider converting pgn database into a capped collection?

ppigazzini · 2020-04-26T13:50:37Z

STC pgns purge (should be 2 days) stopped working 2 weeks ago, I opened #642.

Fixed by @tomtor with #643 :)

@noobpwnftw with a single capped pgns collection in case of problems we could lose most of LTC PGNs (the older data)

vdbergh · 2020-04-27T07:20:53Z

Currently a worker update is a bit cumbersome since if I understand correctly @noobpwnftw has to do it manually (too many downloads otherwise from Github).

I wonder if the updating could happen through the Fishtest server. Using an api call the server could download a new worker from github once and then cache it.

ppigazzini · 2020-04-27T09:38:40Z

I wonder if the updating could happen through the Fishtest server. Using an api call the server could download a new worker from github once and then cache it.

Only 12 KB for the zipped worker files (skipping the requests package that should be installed with pip).

Alayan-stk-2 · 2020-05-02T18:54:29Z

Don't reset base branch/base signature when switching to NumGames. I assume the feature is there for when one clicks SPSA, as SPSA tests should have identical base and test branches, but for NumGames it just wastes time if base was filled before selecting NumGames.

ppigazzini · 2020-05-04T19:34:37Z

Don't reset base branch/base signature when switching to NumGames. I assume the feature is there for when one clicks SPSA, as SPSA tests should have identical base and test branches, but for NumGames it just wastes time if base was filled before selecting NumGames.

@Alayan-stk-2 should be fixed now.

snicolet · 2020-05-05T19:35:02Z

Suggestion: allow the submitter of a test to modify (using the Modify button) the notes of a running test.

Implements vondele's suggestion here: official-stockfish#625 (comment)

Implements vondele's suggestion here: #625 (comment)

silversolver1 · 2020-05-16T16:38:22Z

One idea is: on pages of individual tests, be able to click user's username to go to the page containing all their tests

ppigazzini · 2020-05-16T17:50:17Z

One idea is: on pages of individual tests, be able to click user's username to go to the page containing all their tests

@silversolver1 already implemented :)

silversolver1 · 2020-05-16T22:20:09Z

My mistake, I should have been more specific. I mean on the page of an actual test where a user can access the stop/purge, reschedule or modify buttons, next to where the value for itp is listed, there is the word "username". here, the user's particular name does not currently link to that user's /tests/user/ page, which is the feature I am suggesting

linrock · 2020-09-06T20:57:21Z

@ppigazzini oops, can this be re-opened? didn't expect that linking to a comment in this issue from that PR would close it. still a bunch of unresolved suggestions in this issue

linrock mentioned this issue Apr 22, 2020

Show short new/base commit hashes on tests_view pages #624

Merged

ppigazzini added the enhancement label Apr 22, 2020

linrock mentioned this issue Apr 22, 2020

Hide master diff link if base branch is same as stockfish master #631

Merged

linrock mentioned this issue Apr 25, 2020

Link to your tests in the sidebar for logged-in users #640

Merged

linrock mentioned this issue Apr 26, 2020

Enable modifying auto-purge on tests_view pages #641

Merged

vdbergh added a commit to vdbergh/fishtest that referenced this issue May 3, 2020

A quick attempt at fixing official-stockfish#625 (comment)

f68b4c3

vdbergh mentioned this issue May 3, 2020

A quick attempt at fixing https://github.com/glinscott/fishtest/issue… #658

Merged

ppigazzini pushed a commit that referenced this issue May 4, 2020

A quick attempt at fixing #625 (comment)

3a0057d

linrock added a commit to linrock/fishtest that referenced this issue May 10, 2020

Show link to original test when viewing a rescheduled test

c525a94

Implements vondele's suggestion here: official-stockfish#625 (comment)

linrock added a commit to linrock/fishtest that referenced this issue May 10, 2020

Show link to original test when viewing a rescheduled test

c3c6f14

Implements vondele's suggestion here: official-stockfish#625 (comment)

linrock mentioned this issue May 10, 2020

Show link to original test when viewing a rescheduled test #665

Merged

ppigazzini pushed a commit that referenced this issue May 11, 2020

Show link to original test when viewing a rescheduled test

6e764ee

Implements vondele's suggestion here: #625 (comment)

linrock mentioned this issue Sep 6, 2020

Link username to user tests page from tests_view pages #788

Merged

ppigazzini closed this as completed in #788 Sep 6, 2020

ppigazzini reopened this Sep 6, 2020

ppigazzini pinned this issue Aug 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ideas for improvement [RFC] #625

Ideas for improvement [RFC] #625

linrock commented Apr 22, 2020

vondele commented Apr 22, 2020

linrock commented Apr 22, 2020

vondele commented Apr 22, 2020

linrock commented Apr 22, 2020

ppigazzini commented Apr 22, 2020 •

edited

Loading

vondele commented Apr 23, 2020

linrock commented Apr 24, 2020

vdbergh commented Apr 24, 2020 •

edited

Loading

31m059 commented Apr 24, 2020

linrock commented Apr 25, 2020

vdbergh commented Apr 25, 2020

vondele commented Apr 25, 2020

vondele commented Apr 25, 2020

vondele commented Apr 25, 2020

vondele commented Apr 25, 2020

vondele commented Apr 26, 2020

ppigazzini commented Apr 26, 2020

vondele commented Apr 26, 2020

ppigazzini commented Apr 26, 2020

noobpwnftw commented Apr 26, 2020

ppigazzini commented Apr 26, 2020

vdbergh commented Apr 27, 2020

ppigazzini commented Apr 27, 2020

Alayan-stk-2 commented May 2, 2020

ppigazzini commented May 4, 2020

snicolet commented May 5, 2020

silversolver1 commented May 16, 2020

ppigazzini commented May 16, 2020

silversolver1 commented May 16, 2020

linrock commented Sep 6, 2020

Ideas for improvement [RFC] #625

Ideas for improvement [RFC] #625

Comments

linrock commented Apr 22, 2020

vondele commented Apr 22, 2020

linrock commented Apr 22, 2020

vondele commented Apr 22, 2020

linrock commented Apr 22, 2020

ppigazzini commented Apr 22, 2020 • edited Loading

vondele commented Apr 23, 2020

linrock commented Apr 24, 2020

vdbergh commented Apr 24, 2020 • edited Loading

31m059 commented Apr 24, 2020

linrock commented Apr 25, 2020

vdbergh commented Apr 25, 2020

vondele commented Apr 25, 2020

vondele commented Apr 25, 2020

vondele commented Apr 25, 2020

vondele commented Apr 25, 2020

vondele commented Apr 26, 2020

ppigazzini commented Apr 26, 2020

vondele commented Apr 26, 2020

ppigazzini commented Apr 26, 2020

noobpwnftw commented Apr 26, 2020

ppigazzini commented Apr 26, 2020

vdbergh commented Apr 27, 2020

ppigazzini commented Apr 27, 2020

Alayan-stk-2 commented May 2, 2020

ppigazzini commented May 4, 2020

snicolet commented May 5, 2020

silversolver1 commented May 16, 2020

ppigazzini commented May 16, 2020

silversolver1 commented May 16, 2020

linrock commented Sep 6, 2020

ppigazzini commented Apr 22, 2020 •

edited

Loading

vdbergh commented Apr 24, 2020 •

edited

Loading