Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

windows platform bug fixes #2045

Merged
merged 32 commits into from
Nov 2, 2022
Merged

Conversation

majestrate
Copy link
Contributor

@majestrate majestrate commented Nov 1, 2022

replaces #2040 (making new one to not force push to old one)

system layer:

new interface type for interacting with the system layer that hides platform dependent details from caller. this unifies win32 and systemd codepaths.

  • [win32] redid all of the win32 service code as it was crufty
  • [win32] we now send the right messages to win32 service manager when asked
  • [win32] we now shut down correctly on win32 service mode
  • [systemd] moved sd_notify into system layer manager implementation

dns:

windows dns was crashy

win32 platform code:

resolve threading issues in wintun and windivert, both were race conditions on tear down.

misc:

  • rearrange log initialization in daemon/lokinet
  • more log/debug statements

remove dead code:

  • dns win32 platform
  • stray unbound resolver
  • unused constants

tewinget and others added 30 commits October 27, 2022 09:32
We should send STOP_PENDING rather than STOPPED while we aren't yet
actually stopped; STOPPED is already sent when we actually finish
stopping.

Also fixes some silly argc/argv shenanigans, I think.
report status to window service manager when we get and iterogate
message from the service manager.
update comments to reflect these changes.
the win32 and sd_notify components provided a disjointed set of
similar high level functionality so we consolidate these duplicate
code paths into one that has the same lifecycle regardless of platform
to reduce complexity of this feature.

this new component is responsible for reporting state changes to the
system layer and optionally propagating state change to lokinet
requested by the system layer (used by windows service).
- win32_platform.cpp is dead
- win32_platform.hpp is useless

Style changes from clang-tidy warnings:
- remove `virtual` from some definitions that already have `override`
- remove virtual destructor from NetworkInterface because it already has
  a virtual destructor via the base type (and clang-tiny warns about it)
Fixes windows shutdown crashes:

- windivert wasn't handling an ERROR_NO_DATA, which it gets when
  finished handling everything after a shutdown.
- wintun ReadPacket still gets invoked after end_session is called, but
  shouldn't be.  This adds an atomic<bool> to early return.
- fixes up some settings we send for windows service manager notify
If wintun fails it seems to take about 15s, so extend the startup
timeout so that it can fail gracefully (and let us clean up before
exiting).

Also refactors the timeouts to chrono constants.
use `fmt::join` for lokinet version string in status

Co-authored-by: Jason Rhinelander <[email protected]>
Get rid of the --win32-daemon hack (which was removed from the service
itself earlier in this PR, by mistake) and replace it with detection of
the error code for "not running as a service" that windows gives us back
if we try to set up service controller dispatching but aren't a service.
we were calling llarp::Context::HandleSignal from a non mainloop
thread when running as a win32 service. this caused issues with a non
clean destruction.

call our signal handler instead of llarp::Context::HandleSignal
If already below info (e.g. debug) it should stay there; we only want to
*lower* it to info if above info.
We do and should be able to call this multiple times during shutdown to
signal that we are advancing through shutdown.
Occasionally during shutdown windivert will crash because a thread tries
sending after we've called wd::shutdown, which isn't allowed.  Add an
atomic bool to prevent this.
Query->Cancel() will remove the Query, but that introduces a race
condition where unbound may still try to invoke the callback (with a
no-longer-valid pointer) if we do it before the ub_ctx_delete call.

Move to it afterwards so that we only cancel things that unbound didn't
We need to make a copy here because (see comment).
@majestrate majestrate added this to the 0.9.10 milestone Nov 1, 2022
@majestrate majestrate requested a review from jagerman November 1, 2022 15:25
we_changed_our_state can accept the state we are in right now, so this
assert no longer is correct.
@majestrate majestrate requested a review from tewinget November 1, 2022 22:44
@majestrate majestrate merged commit 1c51bb1 into oxen-io:dev Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants