-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protocol draft - for discussion only #4
base: master
Are you sure you want to change the base?
Conversation
Great, I'll spend time reviewing this later tonight or tomorrow morning. |
|
||
Missing: how to run on top of HTTP/2. Possibly the structure or semantics of publisher name strings. | ||
Might be removed: extension support. | ||
Need to choose serialization method (protobuf, msgpack, other?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting we define one, or just that the bytes should be serializable using any of these mechanisms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm suggesting we pick one and mandate it. This is for serializing the framing and the protocol messages themselves, not the inner payloads. The payloads are necessarily opaque to the protocol, but the user can choose to use the same serialization format for the payloads for synergy.
@danarmak This is great, especially how quickly you put it together, and exactly the type of collaboration I was hoping for (particularly since defining this type of protocol is not my comfort zone). |
cc @tmontgomery |
This is related to "Start Network Protocol Document" #2 |
I would like to discuss an option of splitting this protocol into the following parts:
3 above is the core of this protocol which essentially defines the framing semantics of an RS message. The other aspects have overlap from the actual transport protocol we run RS.io on. eg: HTTP/2 provides handshake, multiplexing and goodbye semantics. If I was to model 3 independent of 2, it will not have the semantics of a stream having multiple publishers and subscribers. The association of a publisher + subscriber to a stream will be out of band of the message exchange. eg:
I can certainly see us defining handshake, multiplexing and shutdown semantics for all protocols apart from HTTP/2 but breaking this protocol into these independent fine grained parts will help us interplay with different protocols effectively IMHO. PS: I think this is what @maniksurtani is referring to here |
@NiteshKant exactly. And yes, breaking down the protocol into such parts helps. |
|
||
The full complexity of these formats may not be needed today, but future protocol extensions might benefit. Also, an implementation might encode the published elements using the same format and decode both the framing and the messages using the same parser. | ||
|
||
Each message (frame) begins with a message type, which is a varint, followed by its contents. Messages are self-delimiting, because their structure is known from their type, and all fields are either of fixed size, self-delimited varints, or length-prefixed strings or byte arrays. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the structure of each message being as clearly defined as this, I don't see the need to specify an external serialization format. The simple types defined (vints, length-prefixed arrays, booleans) as a part of the message should more than cover the needs of the protocol itself, without a more formal/heavyweight serialization format/library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are varints in the header, won't this lead to complex and slow multi-read() de-serialization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@experquisite The message type could be a single byte. Even if we run out of values, types could be defined where the second byte onwards specifies a subtype. I think this is probably a good change to make regardless of varint parsing efficiency, and I'll make it.
The varints used in length-prefixed arrays could be replaced with regular 32bit ints. This would use on average 2, maybe 3 more bytes per message. My intuition was to optimize for size. But I really don't know if there would be a noticeable performance hit.
If you have the time, you can run a performance test scenario and find out...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maniksurtani I think the need of a serialization format also arises by the variance in message types (hello, subscribe, goodbye, packedNext, etc.) which are not completely defined in the RS SPI. In order to standardize the definition, we can leverage the existing serialization library, eg: as a protobuf IDL.
OTOH, if part of the protocol matches 1-1 with the SPI in RS, we may then just define the standard framing structure and not worry about message definitions.
This sentence was present in a previous draft but was accidentally deleted.
@NiteshKant splitting the protocol makes each part simpler, but also makes the sum of the parts more complex, or at least much more verbose. I'd like to propose holding off on splitting it until and unless we agree that there will be at least one supported transport other than 'byte stream' (TCP, websockets, pipes, etc) that will be able to share part 3. Your example of TCP without multiplexing doesn't seem to require a different protocol - if you're not using multiplexing, just sending the single subscribe message is easy and skipping this message isn't worth a separate and incompatible protocol, IMO. HTTP/2 is an open question. We could define a custom frame type and specify a protocol directly on top of the HTTP/2 transport. But that wouldn't actually have HTTP semantics, i.e. no generic HTTP software would support it. And I don't think that would be useful. I listed the use cases I could think of for RS.io over HTTP/x in #6, and they're all about using RS.io inside browsers (where websockets are a better fit) and integrating with existing HTTP stacks and servers. To integrate usefully with HTTP, we would need to use HTTP messages and HTTP semantics. Then the actual protocol becomes different enough from RS.io over TCP that they won't really share anything except the underlying RS semantics. At least that's the conclusion I came to and described in #6. What do you think? As for UDP, the RS semantics need to be considered before the RS.io protocol. The nature of UDP is that the sender isn't constrained by back-pressure from the receiver. And some messages (stream elements) get lost, so if the RS subscriber sends |
2. If an extension changes the semantics of message types defined in this specification or by another extension, the modified behavior MUST be negotiated by at least one of the parties sending, and the other acknowledging, a message (defined by the extension being discussed) that declares the new behavior as active. A party supporting such an extension SHOULD NOT send messages whose semantics are modified by it before this negotiation is completed (i.e. the acknowledgement message is received). | ||
|
||
The client can optimistically send more messages after the `clientHello` without waiting for the `serverHello`. If it eventually receieves a `serverHello` with a different protocol version, it must consider that its messages were discarded. Future protocol versions will not be backward-compatible with version 0, in the sense that if a server multiple versions (e.g. both version 0 and some future version 1), it must wait for the `clientHello` and then send a `serverHello` with a version number matching the client's. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The negotiation looks fine. It might be good to mention that the union of extensions is what is chosen. I.e. both ends must support it and agree to use it. Also, ordering of extensions might be necessary to specify. Just some wording to be clear.
It might be good to think of most operation as extensions. Such as serialization, compression, encryption, etc. Might be cleaner way to specify these changing needs. If so, we might just borrow some HTTP semantics here. Lots of good stuff that can be leveraged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed an update that clarifies extension negotiation.
What is chosen is not the union but the intersection of extensions. I suspect this is what you mean too, since I don't see how the union could work; it would include extensions not supported by one of the two parties.
--> goodbye(reason: String) | ||
<-- goodbye(reason: String) | ||
|
||
Sending `goodbye` implicitly closes all open streams, equivalently to receiving `cancel` or `onError` messages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to use ACK for the acknowledgement instead. That way it is differentiated from the goodbye.
When I pushed 59a9064, a conversation about extensions disappeared because I deleted the line it was attached to. I can't find a way to access it anymore. Sorry about that - what should I have done instead? |
@tmontgomery I'm going to push the changes removing varints and limiting messages to 64K, but then the comments asking for that will (probably) disappear. Is that OK, or is there a better github workflow that I don't know about? It just feels odd when history goes missing with Git. |
This makes it clear to both parties which party closed the session.
I found it interesting to try to write a network protocol specification. This is a first draft and it might well be completely unsuitable. I'm submitting the PR only to generate discussion, not for merging.