-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Untested try at rejecting workers submitting bad data #1394
Conversation
Marked as draft because it needs review/testing but in principle, it should be merged as soon as possible. This addresses server-side filtering of problematic tasks caused by official-stockfish#1393. Obviously we should also prevent this worker-side problem in the first place, but in the short run this should at least prevent fishtests from being polluted by garbage as they currently are
0e42b18
to
26dca1d
Compare
53c0634
to
d5ea015
Compare
Well at least it passes CI now, still can't say what it might do in production, I hope someone more knowledgeable can work on this and merge it |
On issue with this PR is that it is run after every update. So if if a task start with a time loss the task will be cancelled, even if the total number of time losses turns out to be less than 1% (easily fixable of course). As an alternative to this PR we could also simply turn on auto purge again... Note that auto purge rejects tasks with |
An advantage of the current PR would be that the offending workers would be flagged in the event log. The owner could then split the worker into several lower core ones (this would just involve copying the worker directory and starting the subworkers once with a lower concurrence value). Of course doing this internally in fishtest would be preferable but it is a substantial task. |
Ah, I see. Thanks for the review. Yes I concur that rejecting a 1/1 timeloss isn't ideal, altho it would have been an improvement when many workers were doing too many of these. So Why was autopurging disabled? I am quite ignorant about fishtest and its history, as you can tell. (And of course now seeing that, I should also have simply used |
both purging and deleting these batches is a hack... this needs to be solved at the root IMO. Cutechess simply can't deal with the concurrency. Even if there are no time losses, this could severely distort time management nevertheless. The only real solution IMO is to internally split the worker. We know the purging doesn't work well with NNUE, as machines with different hardware architecture are different. |
Installing bad-data filters on fishtest isn't a hack, it's simply good engineering. That said, any sources of bad data should also be solved at the root, prevented from occurring in the first place. By no means are the two approaches mutually exclusive. Both strategies should be pursued to completion
Is this a "doesn't run properly" sort of thing, or simply in terms of results out of whack with the test average? If the latter, perhaps we could separate the purging of atypical results from the purging of crashes/timelosses. They are definitely different categories of error, perhaps we should handle them separately. |
I think this problem is gone now. At least the chi^2 test does not indicate systemic deviating workers (other than time losses). Perhaps this has to do with the switch to the UHO book. The noob book with its extremely high draw ratio was perhaps more sensitive to small hardware differences. |
I think this would still show for tests of new NNUE architectures. However, if we don't fix the problem with concurrency, we'll just remove all contributions from linrock with his 95 core servers, and sebastronomy with 48 cores. |
We should of course fix the cause of the symptom, but nevertheless bad data is bad data and should be removed from test statistics. That said, I now believe this PR isn't the best way to go about filtering the bad data. I think being able to separately purge crashes/timelosses from "standard" residuals is the way to go. (I'm fine with keeping autopurging off, it's not a very future-proof tool.) The main problem is I don't know where the purge button is for existing tests on the site. |
Ahhhhhh. I think, back when I was first poking around, I assumed that meant "purge from the DB", not anything relating to individual tasks... thereafter it was forever immediately filtered from my vision whenever I load a page... |
@vondele We would encourage them to to split up their workers into several lower core workers. This is a completely trivial operation (just copy the worker directory and restart the workers with a lower number of cores). I agree this is not ideal but doing the splitting internally in the worker is not a trivial operation and a volunteer needs to step up to do it. Note: while the linrock workers seem to genuinely suffer from the high core cutechess issue, I am not so sure about sebastronomy. Technologov has several 56 core workers and these seem to run without issues. |
I thought so too... maybe the name should be changed to something like "Clear tasks" or just "Purge tasks"? |
Marked as draft because it needs review/testing but in principle, it should be merged as soon as possible.
This addresses server-side filtering of problematic tasks caused by #1393. Obviously we should also prevent this worker-side problem in the first place, but in the short run this should at least prevent fishtests from being polluted by garbage as they currently are