Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce log_message action. #1906

Merged
merged 1 commit into from
Mar 4, 2024

Conversation

vdbergh
Copy link
Contributor

@vdbergh vdbergh commented Mar 1, 2024

This may be used to give internal errors in the logic of Fishtest maximum visibility.

This action is currently used in two places:

  • if a finished run does not validate;

  • if check_results finds a discrepancy between the incrementally updated results and the computed results in a finished run.

Currently both errors are logged but otherwise they are treated silently.

Another application in the future might be to log certain worker errors which happen before a task is requested (e.g. failure to be able to run cutechess-cli).

While at it, replace all occurences of _validate(), which returns an error string on non-validation, by the more Pythonic validate(), which throws an exception instead.

@ppigazzini ppigazzini added enhancement server server side changes labels Mar 4, 2024
@ppigazzini
Copy link
Collaborator

ppigazzini commented Mar 4, 2024

The PR on DEV seems to run fine.

But the process breaks trying to trigger the new code with:

diff --git a/server/fishtest/rundb.py b/server/fishtest/rundb.py
index 888eae1..a6408ac 100644
--- a/server/fishtest/rundb.py
+++ b/server/fishtest/rundb.py
@@ -1487,7 +1487,8 @@ After fixing the issues you can unblock the worker at

         # Log any discrepancies between incremented and recalculated results
         for s in ["wins", "losses", "draws", "crashes", "time_losses"]:
-            if old.get(s, -1) != new.get(s, -1):
+            #if old.get(s, -1) != new.get(s, -1):
+            if old.get(s, -1) == new.get(s, -1):
                 info = "Check_results: task {}/{} {} results mismatch: {}/{}".format(
                     run_id, task_id, s, old.get(s, -1), new.get(s, -1)
                 )
@@ -1518,7 +1519,8 @@ After fixing the issues you can unblock the worker at
             for i, (old_value, new_value) in enumerate(
                 zip(old["pentanomial"], new["pentanomial"])
             ):
-                if old_value != new_value:
+                #if old_value != new_value:
+                if old_value == new_value:
                     info = "Check_results: task {}/{} pentanomial value {} results mismatch: {}/{}".format(
                         run_id, task_id, i, old_value, new_value
                     )

The server fails to stop the run:
image

No new event type in the Events log:
image

Errors in server log:

Mar 04 11:56:14 dfts-0 pserve[10382]: Traceback (most recent call last):
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/env/lib/python3.11/site-packages/pyramid/tweens.py", line 13, in _error_handler
Mar 04 11:56:14 dfts-0 pserve[10382]:     response = request.invoke_exception_view(exc_info)
Mar 04 11:56:14 dfts-0 pserve[10382]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/env/lib/python3.11/site-packages/pyramid/view.py", line 786, in invoke_exception_view
Mar 04 11:56:14 dfts-0 pserve[10382]:     raise HTTPNotFound
Mar 04 11:56:14 dfts-0 pserve[10382]: pyramid.httpexceptions.HTTPNotFound: The resource could not be found.
Mar 04 11:56:14 dfts-0 pserve[10382]: During handling of the above exception, another exception occurred:
Mar 04 11:56:14 dfts-0 pserve[10382]: Traceback (most recent call last):
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/env/lib/python3.11/site-packages/waitress/channel.py", line 428, in service
Mar 04 11:56:14 dfts-0 pserve[10382]:     task.service()
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/env/lib/python3.11/site-packages/waitress/task.py", line 168, in service
Mar 04 11:56:14 dfts-0 pserve[10382]:     self.execute()
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/env/lib/python3.11/site-packages/waitress/task.py", line 436, in execute
Mar 04 11:56:14 dfts-0 pserve[10382]:     app_iter = self.channel.server.application(environ, start_response)
Mar 04 11:56:14 dfts-0 pserve[10382]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/env/lib/python3.11/site-packages/waitress/proxy_headers.py", line 64, in translate_proxy_headers
Mar 04 11:56:14 dfts-0 pserve[10382]:     return app(environ, start_response)
Mar 04 11:56:14 dfts-0 pserve[10382]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/env/lib/python3.11/site-packages/pyramid/router.py", line 270, in __call__
Mar 04 11:56:14 dfts-0 pserve[10382]:     response = self.execution_policy(environ, self)
Mar 04 11:56:14 dfts-0 pserve[10382]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/env/lib/python3.11/site-packages/pyramid/router.py", line 276, in default_execution_policy
Mar 04 11:56:14 dfts-0 pserve[10382]:     return router.invoke_request(request)
Mar 04 11:56:14 dfts-0 pserve[10382]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/env/lib/python3.11/site-packages/pyramid/router.py", line 245, in invoke_request
Mar 04 11:56:14 dfts-0 pserve[10382]:     response = handle_request(request)
Mar 04 11:56:14 dfts-0 pserve[10382]:                ^^^^^^^^^^^^^^^^^^^^^^^
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/env/lib/python3.11/site-packages/pyramid/tweens.py", line 43, in excview_tween
Mar 04 11:56:14 dfts-0 pserve[10382]:     response = _error_handler(request, exc)
Mar 04 11:56:14 dfts-0 pserve[10382]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/env/lib/python3.11/site-packages/pyramid/tweens.py", line 17, in _error_handler
Mar 04 11:56:14 dfts-0 pserve[10382]:     reraise(*exc_info)
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/env/lib/python3.11/site-packages/pyramid/util.py", line 733, in reraise
Mar 04 11:56:14 dfts-0 pserve[10382]:     raise value
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/env/lib/python3.11/site-packages/pyramid/tweens.py", line 41, in excview_tween
Mar 04 11:56:14 dfts-0 pserve[10382]:     response = handler(request)
Mar 04 11:56:14 dfts-0 pserve[10382]:                ^^^^^^^^^^^^^^^^
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/env/lib/python3.11/site-packages/pyramid/router.py", line 143, in handle_request
Mar 04 11:56:14 dfts-0 pserve[10382]:     response = _call_view(
Mar 04 11:56:14 dfts-0 pserve[10382]:                ^^^^^^^^^^^
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/env/lib/python3.11/site-packages/pyramid/view.py", line 674, in _call_view
Mar 04 11:56:14 dfts-0 pserve[10382]:     response = view_callable(context, request)
Mar 04 11:56:14 dfts-0 pserve[10382]:                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/env/lib/python3.11/site-packages/pyramid/viewderivers.py", line 427, in rendered_view
Mar 04 11:56:14 dfts-0 pserve[10382]:     result = view(context, request)
Mar 04 11:56:14 dfts-0 pserve[10382]:              ^^^^^^^^^^^^^^^^^^^^^^
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/env/lib/python3.11/site-packages/pyramid/viewderivers.py", line 113, in _class_requestonly_view
Mar 04 11:56:14 dfts-0 pserve[10382]:     response = getattr(inst, attr)()
Mar 04 11:56:14 dfts-0 pserve[10382]:                ^^^^^^^^^^^^^^^^^^^^^
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/fishtest/api.py", line 554, in update_task
Mar 04 11:56:14 dfts-0 pserve[10382]:     result = self.request.rundb.update_task(
Mar 04 11:56:14 dfts-0 pserve[10382]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/fishtest/rundb.py", line 1322, in update_task
Mar 04 11:56:14 dfts-0 pserve[10382]:     return self.sync_update_task(worker_info, run_id, task_id, stats, spsa)
Mar 04 11:56:14 dfts-0 pserve[10382]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/fishtest/rundb.py", line 1463, in sync_update_task
Mar 04 11:56:14 dfts-0 pserve[10382]:     self.check_results(run, run_id, task_id)
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/fishtest/rundb.py", line 1495, in check_results
Mar 04 11:56:14 dfts-0 pserve[10382]:     self.actiondb.log_message(
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/fishtest/actiondb.py", line 367, in log_message
Mar 04 11:56:14 dfts-0 pserve[10382]:     self.insert_action(
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/fishtest/actiondb.py", line 377, in insert_action
Mar 04 11:56:14 dfts-0 pserve[10382]:     validate(schema, action, "action")  # may raise exception
Mar 04 11:56:14 dfts-0 pserve[10382]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 04 11:56:14 dfts-0 pserve[10382]:   File "/home/usr00/fishtest/server/env/lib/python3.11/site-packages/vtjson.py", line 344, in validate
Mar 04 11:56:14 dfts-0 pserve[10382]:     raise ValidationError(message)
Mar 04 11:56:14 dfts-0 pserve[10382]: vtjson.ValidationError: action['action'] (value:'log_message') is not equal to 'failed_task' and action['action'] (value:'log_message') is not equal to 'crash_or_time' and action['message'] is not in

@vdbergh
Copy link
Contributor Author

vdbergh commented Mar 4, 2024

Hmm. I will try to duplicate it tomorrow.

@ppigazzini
Copy link
Collaborator

ppigazzini commented Mar 4, 2024

I tested if this fixes the problem:

diff --git a/server/fishtest/actiondb.py b/server/fishtest/actiondb.py
index f12573b..a832128 100644
--- a/server/fishtest/actiondb.py
+++ b/server/fishtest/actiondb.py
@@ -156,6 +156,7 @@ schema = union(
         "message": union("blocked", "unblocked"),
     },
     {
+        "_id?": ObjectId,
         "action": "log_message",
         "username": str,
         "message": str,

But the issue is still there:

Mar 04 12:48:19 dfts-0 pserve[14185]:                ^^^^^^^^^^^^^^^^^^^^^
Mar 04 12:48:19 dfts-0 pserve[14185]:   File "/home/usr00/fishtest/server/fishtest/api.py", line 554, in update_task
Mar 04 12:48:19 dfts-0 pserve[14185]:     result = self.request.rundb.update_task(
Mar 04 12:48:19 dfts-0 pserve[14185]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 04 12:48:19 dfts-0 pserve[14185]:   File "/home/usr00/fishtest/server/fishtest/rundb.py", line 1322, in update_task
Mar 04 12:48:19 dfts-0 pserve[14185]:     return self.sync_update_task(worker_info, run_id, task_id, stats, spsa)
Mar 04 12:48:19 dfts-0 pserve[14185]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 04 12:48:19 dfts-0 pserve[14185]:   File "/home/usr00/fishtest/server/fishtest/rundb.py", line 1463, in sync_update_task
Mar 04 12:48:19 dfts-0 pserve[14185]:     self.check_results(run, run_id, task_id)
Mar 04 12:48:19 dfts-0 pserve[14185]:   File "/home/usr00/fishtest/server/fishtest/rundb.py", line 1495, in check_results
Mar 04 12:48:19 dfts-0 pserve[14185]:     self.actiondb.log_message(
Mar 04 12:48:19 dfts-0 pserve[14185]:   File "/home/usr00/fishtest/server/fishtest/actiondb.py", line 368, in log_message
Mar 04 12:48:19 dfts-0 pserve[14185]:     self.insert_action(
Mar 04 12:48:19 dfts-0 pserve[14185]:   File "/home/usr00/fishtest/server/fishtest/actiondb.py", line 378, in insert_action
Mar 04 12:48:19 dfts-0 pserve[14185]:     validate(schema, action, "action")  # may raise exception
Mar 04 12:48:19 dfts-0 pserve[14185]:     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 04 12:48:19 dfts-0 pserve[14185]:   File "/home/usr00/fishtest/server/env/lib/python3.11/site-packages/vtjson.py", line 344, in validate
Mar 04 12:48:19 dfts-0 pserve[14185]:     raise ValidationError(message)
Mar 04 12:48:19 dfts-0 pserve[14185]: vtjson.ValidationError: action['action'] (value:'log_message') is not equal to 'failed_task' and action['action'] (value:'log_message') is not equal to 'crash_or_time' and action['message'] is not in

This may be used to give internal errors in the logic of
Fishtest maximum visibility.

This action is currently used in two places:

- if a finished run does not validate;

- if check_results finds a discrepancy between the incrementally
updated results and the computed results in a finished run.

Currently both errors are logged but otherwise they are treated silently.

Another application in the future might be to log certain
worker errors which happen before a task is requested (e.g.
failure to be able to run cutechess-cli).

While at it, replace all occurences of _validate(), which returns
an error string on non-validation, by the more Pythonic
validate(), which throws an exception instead.
@vdbergh
Copy link
Contributor Author

vdbergh commented Mar 4, 2024

Latest commit fixes the issue I think.

@ppigazzini
Copy link
Collaborator

ppigazzini commented Mar 4, 2024

Looks good on DEV, triggering the log_message with that trick the server correctly stops the run and writes in the Events log

image

Here is the server log:

Mar 04 16:33:06 dfts-0 pserve[30696]: Request_task: refresh queue
Mar 04 16:34:59 dfts-0 pserve[30696]: Request_task: refresh queue
Mar 04 16:35:23 dfts-0 pserve[30696]: Check_results: task 65e5e886647801fd33db0f46/2 wins results mismatch: 618/618
Mar 04 16:35:23 dfts-0 pserve[30696]: Check_results: task 65e5e886647801fd33db0f46/2 losses results mismatch: 172/172
Mar 04 16:35:23 dfts-0 pserve[30696]: Check_results: task 65e5e886647801fd33db0f46/2 draws results mismatch: 450/450
Mar 04 16:35:23 dfts-0 pserve[30696]: Check_results: task 65e5e886647801fd33db0f46/2 crashes results mismatch: 0/0
Mar 04 16:35:23 dfts-0 pserve[30696]: Check_results: task 65e5e886647801fd33db0f46/2 time_losses results mismatch: 0/0
Mar 04 16:35:23 dfts-0 pserve[30696]: Check_results: task 65e5e886647801fd33db0f46/2 pentanomial value 0 results mismatch: 4/4
Mar 04 16:35:23 dfts-0 pserve[30696]: Check_results: task 65e5e886647801fd33db0f46/2 pentanomial value 1 results mismatch: 47/47
Mar 04 16:35:23 dfts-0 pserve[30696]: Check_results: task 65e5e886647801fd33db0f46/2 pentanomial value 2 results mismatch: 183/183
Mar 04 16:35:23 dfts-0 pserve[30696]: Check_results: task 65e5e886647801fd33db0f46/2 pentanomial value 3 results mismatch: 271/271
Mar 04 16:35:23 dfts-0 pserve[30696]: Check_results: task 65e5e886647801fd33db0f46/2 pentanomial value 4 results mismatch: 115/115
Mar 04 16:35:25 dfts-0 pserve[30696]: Request_task: refresh queue
Mar 04 16:36:35 dfts-0 pserve[30696]: Request_task: refresh queue

@ppigazzini ppigazzini merged commit 3605dd9 into official-stockfish:master Mar 4, 2024
19 checks passed
@ppigazzini
Copy link
Collaborator

PROD updated, thank you @vdbergh :)

@ppigazzini
Copy link
Collaborator

@vdbergh
Copy link
Contributor Author

vdbergh commented Mar 4, 2024

Perhaps it would be best to simply create this user.

@ppigazzini
Copy link
Collaborator

Ok, let me try on DEV

@vdbergh
Copy link
Contributor Author

vdbergh commented Mar 4, 2024

But I am noticing that the run schema needs to be updated :)

@ppigazzini
Copy link
Collaborator

The problem is that a user without CPU/tests contributions is delete after a while.

@ppigazzini
Copy link
Collaborator

But I am noticing that the run schema needs to be updated :)

I updated the schema to account the multiple nets in #1902
But that could be sub optimal to account the already scheduled runs before the switch.

@peregrineshahin
Copy link
Contributor

peregrineshahin commented Mar 4, 2024

The problem is that a user without CPU/tests contributions is delete after a while.

here is a fix for that problem
#1890

@vdbergh
Copy link
Contributor Author

vdbergh commented Mar 4, 2024

But I am noticing that the run schema needs to be updated :)

I updated the schema to account the multiple nets in #1902 But that could be sub optimal to account the already scheduled runs before the switch.

Ok I see. No problem then.

@ppigazzini
Copy link
Collaborator

ppigazzini commented Mar 4, 2024

The problem is that a user without CPU/tests contributions is delete after a while.

here is a fix for that problem #1890

Already reported to the SF maintainers group when I refactored the script, but at time the decision was to be sure to not delete users with some contribution. I made clear that a user get a contribution only if the run finishes.

I warned as well that we already deleted some users with contribution to never finished runs (and perhaps some old users with real contribution):

fishtest/server/utils/delta_update_users.py all
Unable to obtain the port number. Error: -2
full scan
not in userdb: t_username='cturan'; run['_id']=ObjectId('64c21d1edc56e1650a
not in userdb: t_username='EatHose8745'; run['_id']=ObjectId('64b8df79dc56e
not in userdb: t_username='EatHose8745'; run['_id']=ObjectId('64b8589fdc56e
not in userdb: t_username='iamwey'; run['_id']=ObjectId('64b2cf160cdec37b95
not in userdb: t_username='Simmy09'; run['_id']=ObjectId('64ac27a7e0c0e4e9f
not in userdb: t_username='Simmy09'; run['_id']=ObjectId('64ac0454e0c0e4e9f
not in userdb: t_username='SwashBucklerNJ'; run['_id']=ObjectId('64a8ced102
not in userdb: t_username='SwashBucklerNJ'; run['_id']=ObjectId('64a77ff902
not in userdb: t_username='SwashBucklerNJ'; run['_id']=ObjectId('64a756a102
not in userdb: t_username='SwashBucklerNJ'; run['_id']=ObjectId('64a80ff502
not in userdb: t_username='SwashBucklerNJ'; run['_id']=ObjectId('64a310f73e
not in userdb: t_username='SwashBucklerNJ'; run['_id']=ObjectId('64a521f73e
not in userdb: t_username='SwashBucklerNJ'; run['_id']=ObjectId('64a7379902
not in userdb: t_username='SwashBucklerNJ'; run['_id']=ObjectId('64a08abf3e
not in userdb: t_username='SwashBucklerNJ'; run['_id']=ObjectId('64a5ea7e02
not in userdb: t_username='SwashBucklerNJ'; run['_id']=ObjectId('64a7adc102

@peregrineshahin
Copy link
Contributor

Ah I see, I'm dealing with the contributor list as if it is user list because there is no user list.
What we can do is fetch contributors based on our criteria of contribution, and add a list in Users section to view all users.

@view_config(route_name="contributors", renderer="contributors.mak")
def contributors(request):
    users_list = list(request.userdb.user_cache.find())
    users_list.sort(key=lambda k: k["cpu_hours"], reverse=True)
    return {
        "users": users_list,
        "approver": request.has_permission("approve_run"),
    }

@ppigazzini
Copy link
Collaborator

ppigazzini commented Mar 4, 2024

What about reloading the page if clicking on the link /user/fishtest.system, something like this?

diff --git a/server/fishtest/templates/actions.mak b/server/fishtest/templates/actions.mak
index 8a15efb..2f6d84d 100644
--- a/server/fishtest/templates/actions.mak
+++ b/server/fishtest/templates/actions.mak
@@ -177,3 +177,12 @@
   document.getElementById('restrict').value =
     ('${request.GET.get("action") if request.GET.get("action") != None else ''}');
 </script>
+
+<script>
+  document.querySelectorAll('a[href$="/user/fishtest.system"]').forEach(function(link) {
+      link.addEventListener('click', function(event) {
+          event.preventDefault();
+          location.reload();
+      });
+  });
+</script>

@peregrineshahin
Copy link
Contributor

peregrineshahin commented Mar 5, 2024

Better to to not link that with an if statement here while rendering or check if the user exists

<td><a href="/user/${action['user']}">${action['user']}</a></td>

What about reloading the page if clicking on the link /user/fishtest.system, something like this?

diff --git a/server/fishtest/templates/actions.mak b/server/fishtest/templates/actions.mak
index 8a15efb..2f6d84d 100644
--- a/server/fishtest/templates/actions.mak
+++ b/server/fishtest/templates/actions.mak
@@ -177,3 +177,12 @@
   document.getElementById('restrict').value =
     ('${request.GET.get("action") if request.GET.get("action") != None else ''}');
 </script>
+
+<script>
+  document.querySelectorAll('a[href$="/user/fishtest.system"]').forEach(function(link) {
+      link.addEventListener('click', function(event) {
+          event.preventDefault();
+          location.reload();
+      });
+  });
+</script>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement server server side changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants