Protocol draft - for discussion only #4

danarmak · 2015-02-27T00:00:53Z

I found it interesting to try to write a network protocol specification. This is a first draft and it might well be completely unsuitable. I'm submitting the PR only to generate discussion, not for merging.

benjchristensen · 2015-02-27T02:28:40Z

Great, I'll spend time reviewing this later tonight or tomorrow morning.

benjchristensen · 2015-02-27T03:52:20Z

NETWORK_PROTOCOL.md

+
+Missing: how to run on top of HTTP/2. Possibly the structure or semantics of publisher name strings.
+Might be removed: extension support.
+Need to choose serialization method (protobuf, msgpack, other?)


Are you suggesting we define one, or just that the bytes should be serializable using any of these mechanisms?

I'm suggesting we pick one and mandate it. This is for serializing the framing and the protocol messages themselves, not the inner payloads. The payloads are necessarily opaque to the protocol, but the user can choose to use the same serialization format for the payloads for synergy.

benjchristensen · 2015-02-27T04:02:42Z

@danarmak This is great, especially how quickly you put it together, and exactly the type of collaboration I was hoping for (particularly since defining this type of protocol is not my comfort zone).

benjchristensen · 2015-02-27T04:03:20Z

cc @tmontgomery

benjchristensen · 2015-02-27T04:04:45Z

This is related to "Start Network Protocol Document" #2

NiteshKant · 2015-03-02T20:12:22Z

I would like to discuss an option of splitting this protocol into the following parts:

Initial handshake
Stream multiplexing on the same connection.
Actual RS message exchange (subscribe, request, onNext, onError, etc.)
Shutdown(goodbye)

3 above is the core of this protocol which essentially defines the framing semantics of an RS message. The other aspects have overlap from the actual transport protocol we run RS.io on. eg: HTTP/2 provides handshake, multiplexing and goodbye semantics.

If I was to model 3 independent of 2, it will not have the semantics of a stream having multiple publishers and subscribers. The association of a publisher + subscriber to a stream will be out of band of the message exchange. eg:

Pure TCP: A publisher + subscriber talk on a single connection and the subscription is implicit on the the connection.
TCP with RS.io multiplexing: Initiating streams is defined by RS.io and a stream is dedicated to a publisher+subscriber
HTTP/2 with RS.io: HTTP/2 stream is dedicated to a publisher + subscriber.
UDP with RS.io: Unique ports associate a publisher to a subscriber. One can run many such streams between two endpoints each following the RS.io message exchange protocol. (I have limited exposure to UDP, so I may be awfully wrong here)

I can certainly see us defining handshake, multiplexing and shutdown semantics for all protocols apart from HTTP/2 but breaking this protocol into these independent fine grained parts will help us interplay with different protocols effectively IMHO.

PS: I think this is what @maniksurtani is referring to here

maniksurtani · 2015-03-02T21:18:03Z

@NiteshKant exactly. And yes, breaking down the protocol into such parts helps.

maniksurtani · 2015-03-02T21:22:36Z

NETWORK_PROTOCOL.md

+
+The full complexity of these formats may not be needed today, but future protocol extensions might benefit. Also, an implementation might encode the published elements using the same format and decode both the framing and the messages using the same parser.
+
+Each message (frame) begins with a message type, which is a varint, followed by its contents. Messages are self-delimiting, because their structure is known from their type, and all fields are either of fixed size, self-delimited varints, or length-prefixed strings or byte arrays.


With the structure of each message being as clearly defined as this, I don't see the need to specify an external serialization format. The simple types defined (vints, length-prefixed arrays, booleans) as a part of the message should more than cover the needs of the protocol itself, without a more formal/heavyweight serialization format/library.

If there are varints in the header, won't this lead to complex and slow multi-read() de-serialization?

@experquisite The message type could be a single byte. Even if we run out of values, types could be defined where the second byte onwards specifies a subtype. I think this is probably a good change to make regardless of varint parsing efficiency, and I'll make it.

The varints used in length-prefixed arrays could be replaced with regular 32bit ints. This would use on average 2, maybe 3 more bytes per message. My intuition was to optimize for size. But I really don't know if there would be a noticeable performance hit.

If you have the time, you can run a performance test scenario and find out...

@maniksurtani I think the need of a serialization format also arises by the variance in message types (hello, subscribe, goodbye, packedNext, etc.) which are not completely defined in the RS SPI. In order to standardize the definition, we can leverage the existing serialization library, eg: as a protobuf IDL.
OTOH, if part of the protocol matches 1-1 with the SPI in RS, we may then just define the standard framing structure and not worry about message definitions.

… of a varint

This sentence was present in a previous draft but was accidentally deleted.

danarmak · 2015-03-02T22:44:03Z

@NiteshKant splitting the protocol makes each part simpler, but also makes the sum of the parts more complex, or at least much more verbose.

I'd like to propose holding off on splitting it until and unless we agree that there will be at least one supported transport other than 'byte stream' (TCP, websockets, pipes, etc) that will be able to share part 3.

Your example of TCP without multiplexing doesn't seem to require a different protocol - if you're not using multiplexing, just sending the single subscribe message is easy and skipping this message isn't worth a separate and incompatible protocol, IMO.

HTTP/2 is an open question. We could define a custom frame type and specify a protocol directly on top of the HTTP/2 transport. But that wouldn't actually have HTTP semantics, i.e. no generic HTTP software would support it. And I don't think that would be useful. I listed the use cases I could think of for RS.io over HTTP/x in #6, and they're all about using RS.io inside browsers (where websockets are a better fit) and integrating with existing HTTP stacks and servers.

To integrate usefully with HTTP, we would need to use HTTP messages and HTTP semantics. Then the actual protocol becomes different enough from RS.io over TCP that they won't really share anything except the underlying RS semantics. At least that's the conclusion I came to and described in #6. What do you think?

As for UDP, the RS semantics need to be considered before the RS.io protocol. The nature of UDP is that the sender isn't constrained by back-pressure from the receiver. And some messages (stream elements) get lost, so if the RS subscriber sends demand(10) and receives only 5 onNext calls, how will it know whether to wait for more (because there's no more data for now) or to call demand again (because another 5 got lost on the way)? What's the behavior you want to achieve by using UDP and not TCP?

tmontgomery · 2015-03-13T19:14:52Z

NETWORK_PROTOCOL.md

+ 2. If an extension changes the semantics of message types defined in this specification or by another extension, the modified behavior MUST be negotiated by at least one of the parties sending, and the other acknowledging, a message (defined by the extension being discussed) that declares the new behavior as active. A party supporting such an extension SHOULD NOT send messages whose semantics are modified by it before this negotiation is completed (i.e. the acknowledgement message is received).
+
+The client can optimistically send more messages after the `clientHello` without waiting for the `serverHello`. If it eventually receieves a `serverHello` with a different protocol version, it must consider that its messages were discarded. Future protocol versions will not be backward-compatible with version 0, in the sense that if a server multiple versions (e.g. both version 0 and some future version 1), it must wait for the `clientHello` and then send a `serverHello` with a version number matching the client's.
+


The negotiation looks fine. It might be good to mention that the union of extensions is what is chosen. I.e. both ends must support it and agree to use it. Also, ordering of extensions might be necessary to specify. Just some wording to be clear.

It might be good to think of most operation as extensions. Such as serialization, compression, encryption, etc. Might be cleaner way to specify these changing needs. If so, we might just borrow some HTTP semantics here. Lots of good stuff that can be leveraged.

I pushed an update that clarifies extension negotiation.

What is chosen is not the union but the intersection of extensions. I suspect this is what you mean too, since I don't see how the union could work; it would include extensions not supported by one of the two parties.

tmontgomery · 2015-03-13T19:19:43Z

NETWORK_PROTOCOL.md

+    --> goodbye(reason: String)
+    <-- goodbye(reason: String)
+
+Sending `goodbye` implicitly closes all open streams, equivalently to receiving `cancel` or `onError` messages.


Might be good to use ACK for the acknowledgement instead. That way it is differentiated from the goodbye.

danarmak · 2015-03-13T21:08:31Z

When I pushed 59a9064, a conversation about extensions disappeared because I deleted the line it was attached to. I can't find a way to access it anymore. Sorry about that - what should I have done instead?

danarmak · 2015-03-13T21:14:09Z

@tmontgomery I'm going to push the changes removing varints and limiting messages to 64K, but then the comments asking for that will (probably) disappear. Is that OK, or is there a better github workflow that I don't know about? It just feels odd when history goes missing with Git.

This makes it clear to both parties which party closed the session.

danarmak added 5 commits February 26, 2015 22:52

Update NETWORK_PROTOCOL.md

a2ae999

WIP

63e575c

WIP

d486da6

First draft

99b8204

Added alternatives to protobuf

e4b08b7

benjchristensen reviewed Feb 27, 2015
View reviewed changes

benjchristensen mentioned this pull request Feb 27, 2015

Start Network Protocol Document #2

Open

Rename subscribed to onSubscribe to conform to RS

aa9c935

danarmak mentioned this pull request Feb 27, 2015

RS.io over HTTP protocol draft #6

Open

maniksurtani reviewed Mar 2, 2015
View reviewed changes

danarmak added 2 commits March 3, 2015 00:27

Change the message type and protocol version to a single Byte instead…

866d51e

… of a varint

Clarified types used

74382fa

This sentence was present in a previous draft but was accidentally deleted.

danarmak added 2 commits March 3, 2015 00:46

Fix formatting of the first paragraph

a1bd203

Add a note: could add feature to resume broken connections

8aebcaa

tmontgomery reviewed Mar 13, 2015
View reviewed changes

danarmak mentioned this pull request Mar 13, 2015

Goals and Motivation #1

Open

tmontgomery reviewed Mar 13, 2015
View reviewed changes

Clarified extension negotiation

59a9064

Reply to goodbye with goodbyeAck

3514340

This makes it clear to both parties which party closed the session.


		The full complexity of these formats may not be needed today, but future protocol extensions might benefit. Also, an implementation might encode the published elements using the same format and decode both the framing and the messages using the same parser.

		Each message (frame) begins with a message type, which is a varint, followed by its contents. Messages are self-delimiting, because their structure is known from their type, and all fields are either of fixed size, self-delimited varints, or length-prefixed strings or byte arrays.

		2. If an extension changes the semantics of message types defined in this specification or by another extension, the modified behavior MUST be negotiated by at least one of the parties sending, and the other acknowledging, a message (defined by the extension being discussed) that declares the new behavior as active. A party supporting such an extension SHOULD NOT send messages whose semantics are modified by it before this negotiation is completed (i.e. the acknowledgement message is received).

		The client can optimistically send more messages after the `clientHello` without waiting for the `serverHello`. If it eventually receieves a `serverHello` with a different protocol version, it must consider that its messages were discarded. Future protocol versions will not be backward-compatible with version 0, in the sense that if a server multiple versions (e.g. both version 0 and some future version 1), it must wait for the `clientHello` and then send a `serverHello` with a version number matching the client's.

Protocol draft - for discussion only #4

Are you sure you want to change the base?

Protocol draft - for discussion only #4

Uh oh!

Conversation

danarmak commented Feb 27, 2015

Uh oh!

benjchristensen commented Feb 27, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benjchristensen commented Feb 27, 2015

Uh oh!

benjchristensen commented Feb 27, 2015

Uh oh!

benjchristensen commented Feb 27, 2015

Uh oh!

NiteshKant commented Mar 2, 2015

Uh oh!

maniksurtani commented Mar 2, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danarmak commented Mar 2, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danarmak commented Mar 13, 2015

Uh oh!

danarmak commented Mar 13, 2015

Uh oh!

Uh oh!