-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cleanup: refactor and use bicopy everywhere #2944
Conversation
4ec7fa0
to
63ae319
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes the behavior but you did not provide any reason for the change. If the purpose is to remove eliminate a package, just inline the package code in the single user. There is no need to change the code.
If you want to reimplement it and/or change the behavior, please explain why.
Also we have good tests for port_darwin? Before rewriting we should have good tests ensuring that we don't change the behavior by mistake. It should be simple test coping data over unix sockets.
6a612d9
to
c8e7582
Compare
a2a75a4
to
b92ac24
Compare
b92ac24
to
3d32079
Compare
I think the common behavior in all 3 cases is that we start an endless io.Copy() in both directions. The copy is not expected to complete in normal conditions. This is the case for vz/network_darwin.go, I' not sure about the other 2, maybe port forwarding may end earlier in normal conditions. In the vz network case, the copy should stop only when the vm is shutting down, or if socket_vmnet crashed or terminated. In this case we want to log an error or warning message about the peer closing the socket. Any other error should be logged as an error. In base case we want to log the issue immediately without a delay, so we don't lose errors. When one of the io.Copy() goroutines fail, we want to other to fail immediately. I'm not sure this is ensured by current code. This should fixed and tested before we unify the implementation. When both io.Copy() finished, we already logged the error, so there is nothing to return to the caller, even if we have a caller waiting for the result. Usually saving the first error (synchronized with a lock or atomics) and returning it will be more helpful than returning 8 possible errors joined. After something fail, the next errors are usually just noise. If we can ensure that the second copy fail immediately when the first fail, we can return the first error to the caller and the caller can log it. This will be more useful for a library since different callers may want to use different loggers. May not be relevant to lima use case. If we improve bicopy we want to share them with upstream. Did you open upstream issues for the problems you are trying to fix in bicopy? |
I'm just removing features we aren't using. If they need early cancellation that's just fine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the simplified Bicopy!
Good reason why this is not relevant to upstream. |
Looks like I caught a test flake - I don't see bicopy in the stack trace.
|
I've triggered a re-run, but I'm waiting for approval from another reviewer before merging; not going to work my way through 72 comments... 😄 |
<-finish | ||
// TODO: return copied bytes | ||
}() | ||
wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need bicopy to handle the error and not throw it to the caller ??
I would prefer it to be handled by the caller itself. It gives us better control. Handling error ourself looks generic but all we are doing is just logging. But in some cases we may need to act on error in caller side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There has been a lot of back and forth on this already. I'm inclined to leave it as is. Another option is to pass a handler callback, but nobody needs this today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but nobody needs this today
- If errors are returned back, callers will have more control to handle logging parts. Some cases might need more detailed logging rather than generic message context: io.Copy(y, x)
which is not self explanatory.
If you have a valid reason for not returning error please let me know.
Also i think func Bicopy(context string, x, y NamedReadWriter) error {}
should be good as well. We don't need to complicate with handlers / callbacks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nirs expressed concern that if io.Copy
in one direction returns, the io.Copy
in the other direction might not - and returning an error cannot happen until both functions complete. That's why the latest versions logs directly instead of returning an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will eventually fail because one end of stream is closed already.
Since we have wg.Done at last. Returning error after both io.Copy are closed looks correct to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With errgroup as soon as one is returned non-nil its over. It will stop the other go func as well.
If we are using reader/writer then we will not even have close so it's just stopping the loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goroutine is canceled magically? how does it stop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see anything in the docs about cancelling goroutine blocked on a syscall.
https://pkg.go.dev/golang.org/x/sync/errgroup#WithContext
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes you are right, My mistake.
Only context is closed. So we need to check for context close and close the connection.
For pure reader / writer until that syscall is returned we have no other option. But most of our reader / writer have close method so io.Closable should be fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should also be added to tcpproxy upstream - ensure termination and return an error. With both we can use it instead of Bicopy.
Improve error logging in bicopy and extend its use to `pkg/vz/network_darwin.go`. Signed-off-by: Tamir Duberstein <[email protected]>
logrus.WithError(err).Debug("failed to call xCloser.Close") | ||
defer wg.Done() | ||
if _, err := io.Copy(x.ReadWriter, y.ReadWriter); err != nil { | ||
logrus.WithError(err).Errorf("%s: io.Copy(%s, %s)", context, x.Name, y.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logs an error like:
"forwarding packets: io.Copy(VMNENT, VZ)" ... {"error": "actual error from io.Copy"}
For code already calling bicopy, this is an improvement, since before we logged a DEBUG message without any information on which direction failed, or what is the purpose of the copy:
"failed to call io.Copy" ... ... {"error": "actual error from io.Copy"}
But for the call from network_darwin this is worse. Before we had this clear error:
"Failed to forward packets from VZ to VMNET: actual error from io.Copy"
I would like to keep the previous error message or something very close. I don't see the value of logging the error separately using logrous.WithError().
It can be done using:
logrus.Errorf("Failed to %s from %s to %s: %s", context, y.Name, x.Name, err)
defer ac.Close() | ||
unixConn, err := net.DialUnix("unix", nil, plf.unixAddr) | ||
if err != nil { | ||
return err | ||
logrus.WithError(err).Errorf("pseudoloopback forwarder: failed to dial %q", plf.unixAddr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error adds context (good) but logs the error separately (bad). Before we logged only the error from net.DialUnix (line 139).
}, bicopy.NamedReadWriter{ | ||
ReadWriter: unixConn, | ||
Name: "unix", | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The context does not match other context ("pseudoloopback fowarder" vs "forward packets") and the names of the readwriters are little bit too generic, I'm not sure what is the context on this code, may be others can suggest better values.
@@ -29,7 +29,14 @@ func HandleTCPConnection(ctx context.Context, client *guestagentclient.GuestAgen | |||
} | |||
|
|||
rw := &GrpcClientRW{stream: stream, id: id, addr: guestAddr, protocol: "tcp"} | |||
bicopy.Bicopy(rw, conn, nil) | |||
|
|||
bicopy.Bicopy("tcp tunnel", bicopy.NamedReadWriter{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the tunnel for tcp port forwarding, so maybe a better context would be "forward tcp packets for port N". We probably can get the port from conn.RemoteAddr(), since we forward port N to port N on the host.
Name: guestAddr, | ||
}, bicopy.NamedReadWriter{ | ||
ReadWriter: conn, | ||
Name: id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
id is a description of the entire port forwading action: "tcp-%s-%s" (see line 17), so it is not a good for the conn.
|
||
bicopy.Bicopy("tcp tunnel", bicopy.NamedReadWriter{ | ||
ReadWriter: rw, | ||
Name: in.Id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what Id is. Can you share the output we get in runtime?
@@ -32,7 +32,14 @@ func (s *TunnelServer) Start(stream api.GuestService_TunnelServer) error { | |||
return err | |||
} | |||
rw := &GRPCServerRW{stream: stream, id: in.Id} | |||
bicopy.Bicopy(rw, conn, nil) | |||
|
|||
bicopy.Bicopy("tcp tunnel", bicopy.NamedReadWriter{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably use the same context used in the client with the guest port.
Since I don't remember why I care about this and it's not fixing obvious bugs, I'm closing. If you want to reuse this work, you have my permission to do so without attribution. |
See commit message.