Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protocol draft - for discussion only #4

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

danarmak
Copy link

I found it interesting to try to write a network protocol specification. This is a first draft and it might well be completely unsuitable. I'm submitting the PR only to generate discussion, not for merging.

@benjchristensen
Copy link
Contributor

Great, I'll spend time reviewing this later tonight or tomorrow morning.


Missing: how to run on top of HTTP/2. Possibly the structure or semantics of publisher name strings.
Might be removed: extension support.
Need to choose serialization method (protobuf, msgpack, other?)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting we define one, or just that the bytes should be serializable using any of these mechanisms?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suggesting we pick one and mandate it. This is for serializing the framing and the protocol messages themselves, not the inner payloads. The payloads are necessarily opaque to the protocol, but the user can choose to use the same serialization format for the payloads for synergy.

@benjchristensen
Copy link
Contributor

@danarmak This is great, especially how quickly you put it together, and exactly the type of collaboration I was hoping for (particularly since defining this type of protocol is not my comfort zone).

@benjchristensen
Copy link
Contributor

cc @tmontgomery

@benjchristensen
Copy link
Contributor

This is related to "Start Network Protocol Document" #2

@NiteshKant
Copy link

I would like to discuss an option of splitting this protocol into the following parts:

  1. Initial handshake
  2. Stream multiplexing on the same connection.
  3. Actual RS message exchange (subscribe, request, onNext, onError, etc.)
  4. Shutdown(goodbye)

3 above is the core of this protocol which essentially defines the framing semantics of an RS message. The other aspects have overlap from the actual transport protocol we run RS.io on. eg: HTTP/2 provides handshake, multiplexing and goodbye semantics.

If I was to model 3 independent of 2, it will not have the semantics of a stream having multiple publishers and subscribers. The association of a publisher + subscriber to a stream will be out of band of the message exchange. eg:

  • Pure TCP: A publisher + subscriber talk on a single connection and the subscription is implicit on the the connection.
  • TCP with RS.io multiplexing: Initiating streams is defined by RS.io and a stream is dedicated to a publisher+subscriber
  • HTTP/2 with RS.io: HTTP/2 stream is dedicated to a publisher + subscriber.
  • UDP with RS.io: Unique ports associate a publisher to a subscriber. One can run many such streams between two endpoints each following the RS.io message exchange protocol. (I have limited exposure to UDP, so I may be awfully wrong here)

I can certainly see us defining handshake, multiplexing and shutdown semantics for all protocols apart from HTTP/2 but breaking this protocol into these independent fine grained parts will help us interplay with different protocols effectively IMHO.

PS: I think this is what @maniksurtani is referring to here

@maniksurtani
Copy link

@NiteshKant exactly. And yes, breaking down the protocol into such parts helps.


The full complexity of these formats may not be needed today, but future protocol extensions might benefit. Also, an implementation might encode the published elements using the same format and decode both the framing and the messages using the same parser.

Each message (frame) begins with a message type, which is a varint, followed by its contents. Messages are self-delimiting, because their structure is known from their type, and all fields are either of fixed size, self-delimited varints, or length-prefixed strings or byte arrays.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the structure of each message being as clearly defined as this, I don't see the need to specify an external serialization format. The simple types defined (vints, length-prefixed arrays, booleans) as a part of the message should more than cover the needs of the protocol itself, without a more formal/heavyweight serialization format/library.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are varints in the header, won't this lead to complex and slow multi-read() de-serialization?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@experquisite The message type could be a single byte. Even if we run out of values, types could be defined where the second byte onwards specifies a subtype. I think this is probably a good change to make regardless of varint parsing efficiency, and I'll make it.

The varints used in length-prefixed arrays could be replaced with regular 32bit ints. This would use on average 2, maybe 3 more bytes per message. My intuition was to optimize for size. But I really don't know if there would be a noticeable performance hit.

If you have the time, you can run a performance test scenario and find out...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maniksurtani I think the need of a serialization format also arises by the variance in message types (hello, subscribe, goodbye, packedNext, etc.) which are not completely defined in the RS SPI. In order to standardize the definition, we can leverage the existing serialization library, eg: as a protobuf IDL.
OTOH, if part of the protocol matches 1-1 with the SPI in RS, we may then just define the standard framing structure and not worry about message definitions.

This sentence was present in a previous draft but was accidentally deleted.
@danarmak
Copy link
Author

danarmak commented Mar 2, 2015

@NiteshKant splitting the protocol makes each part simpler, but also makes the sum of the parts more complex, or at least much more verbose.

I'd like to propose holding off on splitting it until and unless we agree that there will be at least one supported transport other than 'byte stream' (TCP, websockets, pipes, etc) that will be able to share part 3.

Your example of TCP without multiplexing doesn't seem to require a different protocol - if you're not using multiplexing, just sending the single subscribe message is easy and skipping this message isn't worth a separate and incompatible protocol, IMO.

HTTP/2 is an open question. We could define a custom frame type and specify a protocol directly on top of the HTTP/2 transport. But that wouldn't actually have HTTP semantics, i.e. no generic HTTP software would support it. And I don't think that would be useful. I listed the use cases I could think of for RS.io over HTTP/x in #6, and they're all about using RS.io inside browsers (where websockets are a better fit) and integrating with existing HTTP stacks and servers.

To integrate usefully with HTTP, we would need to use HTTP messages and HTTP semantics. Then the actual protocol becomes different enough from RS.io over TCP that they won't really share anything except the underlying RS semantics. At least that's the conclusion I came to and described in #6. What do you think?

As for UDP, the RS semantics need to be considered before the RS.io protocol. The nature of UDP is that the sender isn't constrained by back-pressure from the receiver. And some messages (stream elements) get lost, so if the RS subscriber sends demand(10) and receives only 5 onNext calls, how will it know whether to wait for more (because there's no more data for now) or to call demand again (because another 5 got lost on the way)? What's the behavior you want to achieve by using UDP and not TCP?

2. If an extension changes the semantics of message types defined in this specification or by another extension, the modified behavior MUST be negotiated by at least one of the parties sending, and the other acknowledging, a message (defined by the extension being discussed) that declares the new behavior as active. A party supporting such an extension SHOULD NOT send messages whose semantics are modified by it before this negotiation is completed (i.e. the acknowledgement message is received).

The client can optimistically send more messages after the `clientHello` without waiting for the `serverHello`. If it eventually receieves a `serverHello` with a different protocol version, it must consider that its messages were discarded. Future protocol versions will not be backward-compatible with version 0, in the sense that if a server multiple versions (e.g. both version 0 and some future version 1), it must wait for the `clientHello` and then send a `serverHello` with a version number matching the client's.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The negotiation looks fine. It might be good to mention that the union of extensions is what is chosen. I.e. both ends must support it and agree to use it. Also, ordering of extensions might be necessary to specify. Just some wording to be clear.

It might be good to think of most operation as extensions. Such as serialization, compression, encryption, etc. Might be cleaner way to specify these changing needs. If so, we might just borrow some HTTP semantics here. Lots of good stuff that can be leveraged.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed an update that clarifies extension negotiation.

What is chosen is not the union but the intersection of extensions. I suspect this is what you mean too, since I don't see how the union could work; it would include extensions not supported by one of the two parties.

@danarmak danarmak mentioned this pull request Mar 13, 2015
--> goodbye(reason: String)
<-- goodbye(reason: String)

Sending `goodbye` implicitly closes all open streams, equivalently to receiving `cancel` or `onError` messages.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to use ACK for the acknowledgement instead. That way it is differentiated from the goodbye.

@danarmak
Copy link
Author

When I pushed 59a9064, a conversation about extensions disappeared because I deleted the line it was attached to. I can't find a way to access it anymore. Sorry about that - what should I have done instead?

@danarmak
Copy link
Author

@tmontgomery I'm going to push the changes removing varints and limiting messages to 64K, but then the comments asking for that will (probably) disappear. Is that OK, or is there a better github workflow that I don't know about? It just feels odd when history goes missing with Git.

This makes it clear to both parties which party closed the session.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants