Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve logic for restarting kernels with new ports during initial startup #348

Open
wants to merge 65 commits into
base: main
Choose a base branch
from

Conversation

skukhtichev
Copy link

The proposal defines the logic in restarter.py for setting _initial_startup value to False only when the kernel sends kernel_info reply. Introduced changes solve issues described in #347. The introduced changes allow to identify issues with shell channel (e.g. due to the ZMQError: Address already in use issue)
The proposal also defines logic for notifying client (Notebook Server, Kernel gateway, etc.) that the kernel was restarted with new ports.
The Proposal includes the following changes:

  • Add the logic for monitoring shell channel:
  • Configurable startup_time parameter is introduced. This parameter defines the timeout for kernel_reply response. Default value is set to 0 (kernel_info reply monitoring disables). It is disabled by default, because different kernels (e.g. kernels with spark context) require different time before the kernel will be ready.
  • If startup_time > 0 then kernel_info request is sent to kernel with the first execution of restarter's poll function
  • Every time when poll function is executed the kernel_info response from shell channel is checked (within startup_time). It is done to prevent application blocking while waiting for the shell messages
  • After receiving kernel_info reply the kernel monitor is disabled
  • If the kernel_info reply is not received within startup_time then kernel is restarted with new ports and restart callback with newports=True parameter is executed
  • Add **kwargs support for registered callback: callback(**kwargs). If additional parameters are not supported then callback() will be executed.
  • When the kernel is restarted then newports parameter indicating whether new ports are initialised is sent. Client could apply the action based on this information
  • Restart callback is fired after kernel_manager.restart_kernel function is executed. It is required because a connection file with new ports is created during kernel restart

takluyver and others added 30 commits October 9, 2017 15:23
MetaKernelFinder -> KernelFinder
Prototype new kernel discovery machinery
The old URL points to a "This page has moved"-page
Updated URL for Jupyter Kernels in other languages
- use IOLoop.current over IOLoop.instance
- drop removed `loop` arg from PeriodicCallback
- deprecate now-unused IOLoopKernelRestarter.loop
- interrupt_mode="signal" is the default and current behaviour
- With interrupt_mode="message", instead of a signal, a
  `interrupt_request` message on the control port will be sent
Additional to the actual signal, send a message on the control port
this should allow ipykernel's wheel-installed specs to specify `python3` or `python2` and prevent
python2 kernels from launching with sys.executable if the Python version is 3.
A simple lead in to the 'kernel nanny' work, this adds a command so you
can do:
jupyter kernel --kernel python
Carreau and others added 24 commits December 15, 2017 21:17
…ance

Improve performance of get_kernel_spec
Tolerate invalid kernel specs in get_all_specs()
Start writing release notes for 5.2
Du to a likely bug in wheel, the conditional dependency on pytest ends
up being unconditional. Seem like adding parenthesis fix that (as a work
around).

See pypa/setuptools#1242
Closes jupyter#324
Parenthesize conditional requirement in setup.py
This shrinks the sdist from 2MB to ~250KB... just realized that after
uploading 5.2.1 took way too long. Apparently 5.2.0 alsho shipped built
docs.
to help inqiure on this jupyter#329
Use the ability to exclude branches as describe there:

 - https://docs.travis-ci.com/user/customizing-the-build/#Safelisting-or-blocklisting-branches

Relatively easy as MrMeeseeks push a known branch format.
This of course cannot be tested until merged and backported, and another
backport triggered.
Tell Travis not to test the push from MrMeeseeks
we could probably avoid this if we registered/unregistered atexit callbacks for instances
instead of registering it once for classes at import time
handle classes having been torn down in atexit
…ation. Made kernel_monitor_enabled non configurable
@takluyver
Copy link
Member

@rolweber - you introduced the 'restart with new ports' machinery in #279. Do you have time to look at this and the accompanying PR jupyter/notebook#3284 ?

@rolweber
Copy link
Contributor

rolweber commented Feb 7, 2018

@takluyver: Unfortunately, I don't have the time. But Sergey is sitting in the same office as I am, so it wouldn't be an independent review anyway :-)

When I introduced the original machinery, I thought that the liveness check is based on heartbeat messages from the kernel. As I learned later, heartbeat messages are optional, and the liveness check really just indicates that the pipes to the started process are still up. Therefore, the condition I checked was meaningless for the intended purpose. Sergey picked up the task of implementing something that actually works, and this PR is the result of his efforts.

@takluyver
Copy link
Member

So with the default settings (startup_time=0.0), the new mechanism is not used, right? If I'm reading the code correctly, that means it will always reselect random ports when restarting the kernel. It also means some of the names are fairly confusing for the default case (e.g. _initial_startup is always True, random_ports_until_alive doesn't really have an 'until alive' clause).

When I set it to actually use the new code, it throws errors in the Qt console because the shell channel is lacking the get_msg() method. The presence of that depends on which client class is in use, which in turn depends on the manager in use... I'm trying to make new APIs to decouple these pieces better (#308).

I like the addition of a way to notify other components that the kernel has restarted with new ports. But if the application can handle that (as the notebook will be able to with your corresponding PR), the extra complexity to decide when to use it seems superfluous - we can just use new ports on every restart.

So I'd say:

  • Get rid of is_kernel_response_timedout and the distinction between 'initial startup' and 'alive', which isn't used by default anyway.
  • The default settings for KernelRestarter should try to restart a kernel using the same ports, to keep things working for frontends that don't yet know about reconnecting to new ports.
  • Frontends that do know about reconnecting to new ports should be able to pass an option when creating a kernel manager, to use new ports on any restart.
  • Maybe look at using backcall to add parameters to the restart callback in a way that doesn't silence unrelated TypeErrors.

@skukhtichev
Copy link
Author

@takluyver Thank you for your review. I will update the pull request regarding your proposals. By default startup_time parameter equals 0.0. It means that the shell channel monitoring is disabled. The idea of introducing shell channel monitor is to identify the cases when the kernel is not responding (e.g. kernel clients could not catch ZMQError: Address already in use error).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants