Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't save the residuals and their colors in the db. #1974

Merged
merged 1 commit into from
Jan 12, 2025

Conversation

vdbergh
Copy link
Contributor

@vdbergh vdbergh commented May 1, 2024

The main reason for this PR is that currently the residuals are updated in tests_view. This is wrong. tests_view is a GET request. It is not allowed to change the server state.

So if we want updated residuals then we need to update them during POST requests, such as update_task and failed_task. However this may be CPU intensive if a fleet of workers disconnects. We could throttle the updates but this requires some extra state that needs to be maintained (the time of the last update).

So the alternative is to compute the residuals on the fly when they are displayed. This is what this PR does. This solves for example the issue #1948. It should now again be possible to serve the tasks from a secondary instance.

Computing residuals on the fly is not possible for bad tasks. So for bad tasks we record the residual and its color from the time they became bad. However the color is saved symbolically as green, yellow, red, instead of as an actual hex-code.

This PR is almost totally untested because

  • testing the purging code is tricky;
  • I lack the resources to generate tasks with meaningful residual

@vdbergh
Copy link
Contributor Author

vdbergh commented May 1, 2024

Of course we should use css to translate the symbolic colors green, yellow, red into actual hex-colors via an appropriate class instead of hard coding this translation in the Python code. Something for a follow up PR...

@vdbergh vdbergh force-pushed the purge2 branch 3 times, most recently from fa402bb to 74cfbbe Compare May 1, 2024 15:59
@ppigazzini
Copy link
Collaborator

ppigazzini commented May 1, 2024

Consider if the code could be clearer using Enum or StrEnum (I don't know if it's compatible with vtjson)
https://docs.python.org/3/library/enum.html
note: PROD is running with python 3.12.3

@vdbergh
Copy link
Contributor Author

vdbergh commented May 1, 2024

We could use an Enum residual_color and use that in the schema (it is just a type), but Enum's aren't used anywhere else in the Fishtest codebase. For now I would prefer to keep things more or less as they are...

What exactly is unclear in the code?

@vdbergh vdbergh force-pushed the purge2 branch 2 times, most recently from d92dac5 to e7b3e20 Compare May 2, 2024 02:24
@peregrineshahin
Copy link
Contributor

any update on the mergeability of this? it looks to me like a free optimization.

@ppigazzini
Copy link
Collaborator

This PR is almost totally untested because

  • testing the purging code is tricky;
  • I lack the resources to generate tasks with meaningful residual

@peregrineshahin these notes are valid for me too. I need a window of free time to process a run with non homegenous workers (usually I schedule a time odds run and hack the code to switch the new/test times in some workers)

@ppigazzini ppigazzini marked this pull request as draft May 31, 2024 11:59
@vdbergh
Copy link
Contributor Author

vdbergh commented Jan 8, 2025

Rebased.

@vdbergh vdbergh marked this pull request as ready for review January 8, 2025 20:40
@ppigazzini ppigazzini added enhancement server server side changes labels Jan 9, 2025
@ppigazzini
Copy link
Collaborator

DEV running with the PR.
The finished tests have the same residuals on PROD and DEV (checked the old ones as well).
There are some on-going tests with time odds and few bad workers with the time reversed to get bad residuals and low p-value, see
https://dfts-0.pigazzini.it/tests/view/6783db6ef90a2cc8d20e5a43

@vdbergh
Copy link
Contributor Author

vdbergh commented Jan 12, 2025

DEV running with the PR. The finished tests have the same residuals on PROD and DEV (checked the old ones as well). There are some on-going tests with time odds and few bad workers with the time reversed to get bad residuals and low p-value, see https://dfts-0.pigazzini.it/tests/view/6783db6ef90a2cc8d20e5a43

Thanks! Comparing with PROD was a good idea! I noticed the test on DEV with dramatically bad residuals 😄

@ppigazzini
Copy link
Collaborator

ppigazzini commented Jan 12, 2025

Internal server while purging https://dfts-0.pigazzini.it/tests/view/6783db7ff90a2cc8d20e5a45

Click to view
Jan 12 17:59:37 dfts-0 pserve[90464]: The run object 67823a24a31c4c13e83518a8 does not validate: run['bad_tasks'] is missing (this is likely not an error as the run object has an older version)
Jan 12 18:00:59 dfts-0 pserve[90464]: 2025-01-12 18:00:59,776 ERROR [waitress][waitress-5] Exception while serving /tests/purge
Jan 12 18:00:59 dfts-0 pserve[90464]: Traceback (most recent call last):
Jan 12 18:00:59 dfts-0 pserve[90464]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/tweens.py", line 13, in _error_handler
Jan 12 18:00:59 dfts-0 pserve[90464]:     response = request.invoke_exception_view(exc_info)
Jan 12 18:00:59 dfts-0 pserve[90464]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/view.py", line 786, in invoke_exception_view
Jan 12 18:00:59 dfts-0 pserve[90464]:     raise HTTPNotFound
Jan 12 18:00:59 dfts-0 pserve[90464]: pyramid.httpexceptions.HTTPNotFound: The resource could not be found.
Jan 12 18:00:59 dfts-0 pserve[90464]: During handling of the above exception, another exception occurred:
Jan 12 18:00:59 dfts-0 pserve[90464]: Traceback (most recent call last):
Jan 12 18:00:59 dfts-0 pserve[90464]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/waitress/channel.py", line 430, in service
Jan 12 18:00:59 dfts-0 pserve[90464]:     task.service()
Jan 12 18:00:59 dfts-0 pserve[90464]:     ~~~~~~~~~~~~^^
Jan 12 18:00:59 dfts-0 pserve[90464]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/waitress/task.py", line 167, in service
Jan 12 18:00:59 dfts-0 pserve[90464]:     self.execute()
Jan 12 18:00:59 dfts-0 pserve[90464]:     ~~~~~~~~~~~~^^
Jan 12 18:00:59 dfts-0 pserve[90464]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/waitress/task.py", line 435, in execute
Jan 12 18:00:59 dfts-0 pserve[90464]:     app_iter = self.channel.server.application(environ, start_response)
Jan 12 18:00:59 dfts-0 pserve[90464]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/waitress/proxy_headers.py", line 64, in translate_proxy_headers
Jan 12 18:00:59 dfts-0 pserve[90464]:     return app(environ, start_response)
Jan 12 18:00:59 dfts-0 pserve[90464]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/router.py", line 270, in __call__
Jan 12 18:00:59 dfts-0 pserve[90464]:     response = self.execution_policy(environ, self)
Jan 12 18:00:59 dfts-0 pserve[90464]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/router.py", line 276, in default_execution_policy
Jan 12 18:00:59 dfts-0 pserve[90464]:     return router.invoke_request(request)
Jan 12 18:00:59 dfts-0 pserve[90464]:            ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
Jan 12 18:00:59 dfts-0 pserve[90464]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/router.py", line 245, in invoke_request
Jan 12 18:00:59 dfts-0 pserve[90464]:     response = handle_request(request)
Jan 12 18:00:59 dfts-0 pserve[90464]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/tweens.py", line 43, in excview_tween
Jan 12 18:00:59 dfts-0 pserve[90464]:     response = _error_handler(request, exc)
Jan 12 18:00:59 dfts-0 pserve[90464]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/tweens.py", line 17, in _error_handler
Jan 12 18:00:59 dfts-0 pserve[90464]:     reraise(*exc_info)
Jan 12 18:00:59 dfts-0 pserve[90464]:     ~~~~~~~^^^^^^^^^^^
Jan 12 18:00:59 dfts-0 pserve[90464]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/util.py", line 733, in reraise
Jan 12 18:00:59 dfts-0 pserve[90464]:     raise value
Jan 12 18:00:59 dfts-0 pserve[90464]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/tweens.py", line 41, in excview_tween
Jan 12 18:00:59 dfts-0 pserve[90464]:     response = handler(request)
Jan 12 18:00:59 dfts-0 pserve[90464]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/router.py", line 143, in handle_request
Jan 12 18:00:59 dfts-0 pserve[90464]:     response = _call_view(
Jan 12 18:00:59 dfts-0 pserve[90464]:         registry, request, context, context_iface, view_name
Jan 12 18:00:59 dfts-0 pserve[90464]:     )
Jan 12 18:00:59 dfts-0 pserve[90464]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/view.py", line 674, in _call_view
Jan 12 18:00:59 dfts-0 pserve[90464]:     response = view_callable(context, request)
Jan 12 18:00:59 dfts-0 pserve[90464]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/config/views.py", line 170, in attr_view
Jan 12 18:00:59 dfts-0 pserve[90464]:     return view(context, request)
Jan 12 18:00:59 dfts-0 pserve[90464]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/config/views.py", line 196, in predicate_wrapper
Jan 12 18:00:59 dfts-0 pserve[90464]:     return view(context, request)
Jan 12 18:00:59 dfts-0 pserve[90464]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/viewderivers.py", line 512, in csrf_view
Jan 12 18:00:59 dfts-0 pserve[90464]:     return view(context, request)
Jan 12 18:00:59 dfts-0 pserve[90464]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/viewderivers.py", line 392, in viewresult_to_response
Jan 12 18:00:59 dfts-0 pserve[90464]:     result = view(context, request)
Jan 12 18:00:59 dfts-0 pserve[90464]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/viewderivers.py", line 141, in _requestonly_view
Jan 12 18:00:59 dfts-0 pserve[90464]:     response = view(request)
Jan 12 18:00:59 dfts-0 pserve[90464]:   File "/home/usr00/fishtest/server/fishtest/views.py", line 1376, in tests_purge
Jan 12 18:00:59 dfts-0 pserve[90464]:     message = request.rundb.purge_run(run, p=0.01, res=4.5)
Jan 12 18:00:59 dfts-0 pserve[90464]:   File "/home/usr00/fishtest/server/fishtest/rundb.py", line 1670, in purge_run
Jan 12 18:00:59 dfts-0 pserve[90464]:     residual_color = residual_to_color(bad_task["residual"], chi2)
Jan 12 18:00:59 dfts-0 pserve[90464]:                                        ^^^^^^^^
Jan 12 18:00:59 dfts-0 pserve[90464]: NameError: name 'bad_task' is not defined

@vdbergh
Copy link
Contributor Author

vdbergh commented Jan 12, 2025

I'll fix it after dinner. The new schema makes "bad_task" mandatory (I want to reduce the number of optional fields), but of course we still need to deal with old runs.

@ppigazzini
Copy link
Collaborator

ppigazzini commented Jan 12, 2025

I created that run after merging the PR, so it should use the new schema.
Enjoy your dinner ;)

@vdbergh
Copy link
Contributor Author

vdbergh commented Jan 12, 2025

You are right. It was a different problem. It should be fixed now. I also rebased.

@ppigazzini
Copy link
Collaborator

I'm reloading a clean DB, then I will make a couple of purgeable runs with master and finally some purgeable runs with the PRs.

@ppigazzini
Copy link
Collaborator

ppigazzini commented Jan 12, 2025

Internal server error purging a run with the new schema
https://dfts-0.pigazzini.it/tests/view/67841f0058b47b9b05385f19

Click to view
Jan 12 21:13:42 dfts-0 pserve[94596]: Validate_data_structures: validating Fishtest's internal data structures...
Jan 12 21:14:15 dfts-0 pserve[94596]: Clean_wtt_map: 9 active workers...
Jan 12 21:14:15 dfts-0 pserve[94596]: The run object 677d8ebde9d3aa82e63be80f does not validate: run['bad_tasks'] is missing (this is likely not an error as the run object has an older version)
Jan 12 21:17:09 dfts-0 pserve[94596]: The run object 6781a8faf21a6e1e7b856f37 does not validate: run['bad_tasks'] is missing (this is likely not an error as the run object has an older version)
Jan 12 21:20:03 dfts-0 pserve[94596]: Clean_wtt_map: 10 active workers...
Jan 12 21:20:03 dfts-0 pserve[94596]: Validate_random_run: validated cache run 67841f1658b47b9b05385f1d...
Jan 12 21:21:34 dfts-0 pserve[94596]: 2025-01-12 21:21:34,958 ERROR [waitress][waitress-7] Exception while serving /tests/purge
Jan 12 21:21:34 dfts-0 pserve[94596]: Traceback (most recent call last):
Jan 12 21:21:34 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/tweens.py", line 13, in _error_handler
Jan 12 21:21:34 dfts-0 pserve[94596]:     response = request.invoke_exception_view(exc_info)
Jan 12 21:21:34 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/view.py", line 786, in invoke_exception_view
Jan 12 21:21:34 dfts-0 pserve[94596]:     raise HTTPNotFound
Jan 12 21:21:34 dfts-0 pserve[94596]: pyramid.httpexceptions.HTTPNotFound: The resource could not be found.
Jan 12 21:21:34 dfts-0 pserve[94596]: During handling of the above exception, another exception occurred:
Jan 12 21:21:34 dfts-0 pserve[94596]: Traceback (most recent call last):
Jan 12 21:21:34 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/waitress/channel.py", line 430, in service
Jan 12 21:21:34 dfts-0 pserve[94596]:     task.service()
Jan 12 21:21:34 dfts-0 pserve[94596]:     ~~~~~~~~~~~~^^
Jan 12 21:21:34 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/waitress/task.py", line 167, in service
Jan 12 21:21:34 dfts-0 pserve[94596]:     self.execute()
Jan 12 21:21:34 dfts-0 pserve[94596]:     ~~~~~~~~~~~~^^
Jan 12 21:21:34 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/waitress/task.py", line 435, in execute
Jan 12 21:21:34 dfts-0 pserve[94596]:     app_iter = self.channel.server.application(environ, start_response)
Jan 12 21:21:34 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/waitress/proxy_headers.py", line 64, in translate_proxy_headers
Jan 12 21:21:34 dfts-0 pserve[94596]:     return app(environ, start_response)
Jan 12 21:21:34 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/router.py", line 270, in __call__
Jan 12 21:21:34 dfts-0 pserve[94596]:     response = self.execution_policy(environ, self)
Jan 12 21:21:34 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/router.py", line 276, in default_execution_policy
Jan 12 21:21:34 dfts-0 pserve[94596]:     return router.invoke_request(request)
Jan 12 21:21:34 dfts-0 pserve[94596]:            ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
Jan 12 21:21:34 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/router.py", line 245, in invoke_request
Jan 12 21:21:34 dfts-0 pserve[94596]:     response = handle_request(request)
Jan 12 21:21:34 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/tweens.py", line 43, in excview_tween
Jan 12 21:21:34 dfts-0 pserve[94596]:     response = _error_handler(request, exc)
Jan 12 21:21:34 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/tweens.py", line 17, in _error_handler
Jan 12 21:21:34 dfts-0 pserve[94596]:     reraise(*exc_info)
Jan 12 21:21:34 dfts-0 pserve[94596]:     ~~~~~~~^^^^^^^^^^^
Jan 12 21:21:34 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/util.py", line 733, in reraise
Jan 12 21:21:34 dfts-0 pserve[94596]:     raise value
Jan 12 21:21:34 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/tweens.py", line 41, in excview_tween
Jan 12 21:21:34 dfts-0 pserve[94596]:     response = handler(request)
Jan 12 21:21:34 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/router.py", line 143, in handle_request
Jan 12 21:21:34 dfts-0 pserve[94596]:     response = _call_view(
Jan 12 21:21:34 dfts-0 pserve[94596]:         registry, request, context, context_iface, view_name
Jan 12 21:21:34 dfts-0 pserve[94596]:     )
Jan 12 21:21:34 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/view.py", line 674, in _call_view
Jan 12 21:21:34 dfts-0 pserve[94596]:     response = view_callable(context, request)
Jan 12 21:21:34 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/config/views.py", line 170, in attr_view
Jan 12 21:21:34 dfts-0 pserve[94596]:     return view(context, request)
Jan 12 21:21:34 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/config/views.py", line 196, in predicate_wrapper
Jan 12 21:21:34 dfts-0 pserve[94596]:     return view(context, request)
Jan 12 21:21:34 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/viewderivers.py", line 512, in csrf_view
Jan 12 21:21:34 dfts-0 pserve[94596]:     return view(context, request)
Jan 12 21:21:34 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/viewderivers.py", line 392, in viewresult_to_response
Jan 12 21:21:34 dfts-0 pserve[94596]:     result = view(context, request)
Jan 12 21:21:34 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/viewderivers.py", line 141, in _requestonly_view
Jan 12 21:21:34 dfts-0 pserve[94596]:     response = view(request)
Jan 12 21:21:34 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/fishtest/views.py", line 1376, in tests_purge
Jan 12 21:21:34 dfts-0 pserve[94596]:     message = request.rundb.purge_run(run, p=0.01, res=4.5)
Jan 12 21:21:34 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/fishtest/rundb.py", line 1670, in purge_run
Jan 12 21:21:34 dfts-0 pserve[94596]:     residual_color = residual_to_color(task["residual"], chi2)
Jan 12 21:21:34 dfts-0 pserve[94596]:                                        ~~~~^^^^^^^^^^^^
Jan 12 21:21:34 dfts-0 pserve[94596]: KeyError: 'residual'

Internal server error purgin a run with the old schema
https://dfts-0.pigazzini.it/tests/view/678411de217c36aa17a0d411

Click to view
Jan 12 21:23:37 dfts-0 pserve[94596]: Traceback (most recent call last):
Jan 12 21:23:37 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/tweens.py", line 13, in _error_handler
Jan 12 21:23:37 dfts-0 pserve[94596]:     response = request.invoke_exception_view(exc_info)
Jan 12 21:23:37 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/view.py", line 786, in invoke_exception_view
Jan 12 21:23:37 dfts-0 pserve[94596]:     raise HTTPNotFound
Jan 12 21:23:37 dfts-0 pserve[94596]: pyramid.httpexceptions.HTTPNotFound: The resource could not be found.
Jan 12 21:23:37 dfts-0 pserve[94596]: During handling of the above exception, another exception occurred:
Jan 12 21:23:37 dfts-0 pserve[94596]: Traceback (most recent call last):
Jan 12 21:23:37 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/waitress/channel.py", line 430, in service
Jan 12 21:23:37 dfts-0 pserve[94596]:     task.service()
Jan 12 21:23:37 dfts-0 pserve[94596]:     ~~~~~~~~~~~~^^
Jan 12 21:23:37 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/waitress/task.py", line 167, in service
Jan 12 21:23:37 dfts-0 pserve[94596]:     self.execute()
Jan 12 21:23:37 dfts-0 pserve[94596]:     ~~~~~~~~~~~~^^
Jan 12 21:23:37 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/waitress/task.py", line 435, in execute
Jan 12 21:23:37 dfts-0 pserve[94596]:     app_iter = self.channel.server.application(environ, start_response)
Jan 12 21:23:37 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/waitress/proxy_headers.py", line 64, in translate_proxy_headers
Jan 12 21:23:37 dfts-0 pserve[94596]:     return app(environ, start_response)
Jan 12 21:23:37 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/router.py", line 270, in __call__
Jan 12 21:23:37 dfts-0 pserve[94596]:     response = self.execution_policy(environ, self)
Jan 12 21:23:37 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/router.py", line 276, in default_execution_policy
Jan 12 21:23:37 dfts-0 pserve[94596]:     return router.invoke_request(request)
Jan 12 21:23:37 dfts-0 pserve[94596]:            ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^
Jan 12 21:23:37 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/router.py", line 245, in invoke_request
Jan 12 21:23:37 dfts-0 pserve[94596]:     response = handle_request(request)
Jan 12 21:23:37 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/tweens.py", line 43, in excview_tween
Jan 12 21:23:37 dfts-0 pserve[94596]:     response = _error_handler(request, exc)
Jan 12 21:23:37 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/tweens.py", line 17, in _error_handler
Jan 12 21:23:37 dfts-0 pserve[94596]:     reraise(*exc_info)
Jan 12 21:23:37 dfts-0 pserve[94596]:     ~~~~~~~^^^^^^^^^^^
Jan 12 21:23:37 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/util.py", line 733, in reraise
Jan 12 21:23:37 dfts-0 pserve[94596]:     raise value
Jan 12 21:23:37 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/tweens.py", line 41, in excview_tween
Jan 12 21:23:37 dfts-0 pserve[94596]:     response = handler(request)
Jan 12 21:23:37 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/router.py", line 143, in handle_request
Jan 12 21:23:37 dfts-0 pserve[94596]:     response = _call_view(
Jan 12 21:23:37 dfts-0 pserve[94596]:         registry, request, context, context_iface, view_name
Jan 12 21:23:37 dfts-0 pserve[94596]:     )
Jan 12 21:23:37 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/view.py", line 674, in _call_view
Jan 12 21:23:37 dfts-0 pserve[94596]:     response = view_callable(context, request)
Jan 12 21:23:37 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/config/views.py", line 170, in attr_view
Jan 12 21:23:37 dfts-0 pserve[94596]:     return view(context, request)
Jan 12 21:23:37 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/config/views.py", line 196, in predicate_wrapper
Jan 12 21:23:37 dfts-0 pserve[94596]:     return view(context, request)
Jan 12 21:23:37 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/viewderivers.py", line 512, in csrf_view
Jan 12 21:23:37 dfts-0 pserve[94596]:     return view(context, request)
Jan 12 21:23:37 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/viewderivers.py", line 392, in viewresult_to_response
Jan 12 21:23:37 dfts-0 pserve[94596]:     result = view(context, request)
Jan 12 21:23:37 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/env/lib/python3.13/site-packages/pyramid/viewderivers.py", line 141, in _requestonly_view
Jan 12 21:23:37 dfts-0 pserve[94596]:     response = view(request)
Jan 12 21:23:37 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/fishtest/views.py", line 1376, in tests_purge
Jan 12 21:23:37 dfts-0 pserve[94596]:     message = request.rundb.purge_run(run, p=0.01, res=4.5)
Jan 12 21:23:37 dfts-0 pserve[94596]:   File "/home/usr00/fishtest/server/fishtest/rundb.py", line 1670, in purge_run
Jan 12 21:23:37 dfts-0 pserve[94596]:     residual_color = residual_to_color(task["residual"], chi2)
Jan 12 21:23:37 dfts-0 pserve[94596]:                                        ~~~~^^^^^^^^^^^^
Jan 12 21:23:37 dfts-0 pserve[94596]: KeyError: 'residual'

@vdbergh
Copy link
Contributor Author

vdbergh commented Jan 12, 2025

OMG. The same line.

The main reason for this PR is that currently the residuals
are updated in tests_view. This is wrong. tests_view is a
GET request. It is not allowed to change the server state.

So if we want updated residuals then we need to update
them during POST requests, such as update_task and failed_task.
However this may be CPU intensive if a fleet of workers
disconnects. We could throttle the updates but this requires
some extra state that needs to be maintained (the time of the last
update).

So the alternative is to compute the residuals on the fly when
they are displayed. This is what this PR does. This solves for
example the issue official-stockfish#1948. It should now again be possible to
serve the tasks from a secondary instance.

Computing residuals on the fly is not possible for bad tasks.
So for bad tasks we record the residual and its color from the time
they became bad. However the color is saved symbolically as
green, yellow, red, instead of as an actual hex-code.
@vdbergh
Copy link
Contributor Author

vdbergh commented Jan 12, 2025

Another attempt. I am really unable to write correct code without testing.

@vdbergh
Copy link
Contributor Author

vdbergh commented Jan 12, 2025

A static type checker would have caught this...

@ppigazzini
Copy link
Collaborator

Purged both runs!

@ppigazzini
Copy link
Collaborator

Thera are other purgeable runs (master and PR), they start with "#1974" if you want to check.

@vdbergh
Copy link
Contributor Author

vdbergh commented Jan 12, 2025

Thera are other purgeable runs (master and PR), they start with "#1974" if you want to check.

I purged some and looked at the tasks (with the get_task api) and everything seems to be in order.

I also looked at some old purged runs and they display correctly.

Copy link
Collaborator

@ppigazzini ppigazzini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on DEV.

@ppigazzini ppigazzini merged commit d9a9750 into official-stockfish:master Jan 12, 2025
22 checks passed
@ppigazzini
Copy link
Collaborator

A static type checker would have caught this...

PROD is running with python 3.12.8, DEV is already running with python 3.13.1.
The server code can be typed using the latest syntax.

@ppigazzini
Copy link
Collaborator

PROD updated, thank you @vdbergh :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement server server side changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants