Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Decoding of Messages Received from Backend #1

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

moritzsternemann
Copy link
Owner

@moritzsternemann moritzsternemann commented Sep 18, 2022

Description

Implement decoding of the most common messages that can be received from memcached via the Meta Protocol. More information can be found in the protocol reference.

The code still has a few TODOs. Some of them just need to be implemented and some are rather questions on the design. I'll comment on the latter ones individually. Happy about all feedback! 🙌

To be implemented before merging

  • Decode error responses
  • Decode flags
  • Decode Meta Debug and Meta No-op responses
  • More tests for the decoding
  • Tests for the ByteBuffer CRLF extension

do {
// Pass the buffer instead of messageSlice because .value messages continue after the first \r\n
let result = try MemcacheBackendMessage.decode(from: &buffer, for: verb)
// TODO: Can we make sure the message was read entirely? Difficult because we don't know the length of VA messages here.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you have ideas!


extension MemcacheBackendMessage {
struct Flags: MemcacheMessagePayloadDecodable, Equatable, ExpressibleByArrayLiteral {
let flags: [String] // TODO: Do we want something like (Character, token: String?) instead? Or a struct?
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we try to parse the flags a bit further in this stage, we can catch malformed messages earlier and it might make parsing the meta-command-specific flags a bit easier. Flags are always a single character followed by an optional token string. Some tokens have a length limit but I don't think it makes sense to try to enforce it in this stage.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the flags should be an enum with associated values. And I think we should already enforce this here. I think users should be able to never decode anything after this stage and trust the values 100%.

Copy link
Owner Author

@moritzsternemann moritzsternemann Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree 👍 One potential issue however is that flags can have slightly different meanings depending on the sent command. For example, the N(token) flags is described as vivify on miss, takes TTL as argument for Meta Get, and as auto create item on miss with supplied TTL for Meta Arithmetic commands. I couldn't find any collisions so we just need to choose good names for the enum cases.

One thing we can't verify here though is which flags can be part of a message. This also depends on the sent command.

Edit: Or do you think we should just be unopinionated and directly use the flag characters as the enum case names?

Copy link
Owner Author

@moritzsternemann moritzsternemann Sep 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented decoding of flags, I think, as far as possible and added checks for things like data type and length where it makes sense.

Comment on lines 8 to 9
// Keep track of the reader index in case we later notice that we need more data
let startReaderIndex = buffer.readerIndex
Copy link

@fabianfett fabianfett Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of handling indexes so much, a better way is normally to get a copy of the bytebuffer.

var peekableBuffer = buffer

// if you were able to decode a buffer, just write the new reader indexes back:
buffer = peakableBuffer

Comment on lines 11 to 16
// Peek at the message to read the verb. It is before the first \r\n and before the first <space> if the message
// contains one.
guard let messageSlice = buffer.getCarriageReturnNewlineTerminatedSlice(at: buffer.readerIndex) else {
// reader index wasn't moved, wait for more bytes
return nil
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of finding the first line, if that doesn't ensure that this will be the complete first message? IIUC you are interested in the first VERB and you can get that by finding either a space or a \r\n correct?

Copy link
Owner Author

@moritzsternemann moritzsternemann Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes correct. Though being able to read a line does not always indicate that we have a full message. For value messages (format: VA <data block size> <flags>\r\n<data block>\r\n), we only know how long the message should be when we start parsing the part after the verb.
If we would look for a space first, we might fail to parse the following buffer for example: EN\r\nHD <flags>\r\n.

Maybe the messageSlice name is also a bit misleading here 😇

Copy link

@fabianfett fabianfett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great progress! Added some more comments...

],
targets: [
.target(
name: "Memcache",
dependencies: [
.product(name: "NIO", package: "swift-nio")
.product(name: "NIO", package: "swift-nio"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explicitly import NIOCore and NIOPosix. Plain NIO shall go away with the next major release.

],
targets: [
.target(
name: "Memcache",
dependencies: [
.product(name: "NIO", package: "swift-nio")
.product(name: "NIO", package: "swift-nio"),
.product(name: "ExtrasBase64", package: "swift-extras-base64")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH. I would vendor in the decoding/encoding part that you need. Just make sure you mention that you vendor those parts. Also make them internal.

// MARK: -

extension MemcacheFlag {
enum Code: Character {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need Character here. I think using UInt8 likely gives us better performance, since we don't need to go through UTF8 validity checks.

Comment on lines +16 to +22
struct NumericToken<Value: Numeric>: CustomDebugStringConvertible {
var value: Value

var debugDescription: String {
"numeric: \(value)"
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of the NumericToken type? Why can't we use the Numeric values directly? Add this reasoning to the type...

Comment on lines +3 to +13
struct StringToken: ExpressibleByStringLiteral, CustomDebugStringConvertible {
var value: String

init(stringLiteral value: String) {
self.value = value
}

var debugDescription: String {
"string: \(value)"
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we need this wrapper for?


/// Mode switch token used in Set and Arithmetic commands.
enum ModeToken: RawRepresentable, CustomDebugStringConvertible {
typealias RawValue = Character

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the RawValue = UInt8

@@ -0,0 +1,59 @@
import Foundation

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you need Foundation for here?


// Peek at the message to read the verb. It is before the first \r\n and before the first <space> if the message
// contains one.
guard let textLine = peekableBuffer.getCarriageReturnNewlineTerminatedSlice(at: peekableBuffer.readerIndex) else {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would read here instead of getting. In the error cases you can pass the buffer that you keep unmodified for now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you actually need to know the complete text line? Or is the goal really only to get the first verb to then learn how long the message will be?

Comment on lines +18 to +27
let verbLength = (textLine.readableBytesView.firstIndex(of: .space) ?? textLine.writerIndex) - textLine.readerIndex

guard let verbString = textLine.getString(at: textLine.readerIndex, length: verbLength) else {
// If we can't read a string, the text line must be empty (i.e. no characters before the first occurence of \r\n)
throw MemcacheDecodingError.emptyMessageReceived(bytes: peekableBuffer)
}

guard let verb = MemcacheBackendMessage.Verb(rawValue: verbString) else {
throw MemcacheDecodingError.unknownVerbReceived(messageVerb: verbString, messageBytes: peekableBuffer)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would put this into an extension on ByteBuffer, that I would call readVerb() throws -> MemcacheBackendMessage.Verb, which of course moves the readerIndex forward.

line: UInt = #line
) -> Self {
MemcacheDecodingError(
messageVerb: "",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the messageVerb should be an optional?

Copy link

@FranzBusch FranzBusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks really good already. Fabian left some good comments here and I added a few more. If we fix them up and then take another look I think we can make some good progress.

@@ -0,0 +1,53 @@
import ExtrasBase64

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used here?

import ExtrasBase64
import NIOCore

struct MemcacheDecodingError: Error {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to have a public error type at some point. I just left a comment over in the kafka repository with a common error pattern we have established. I think it would be great to adopt this here for the public error type that we use in the end. Might be done in a follow up PR but just leaving this here

self.message = value
}

static func decode(from buffer: inout ByteBuffer) throws -> Self {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make this a method on ByteBuffer probably like readErrorMessage

/// The following formats can be decoded from the `buffer`:
/// - `<flags>\r\n`. Flags are space-separated strings.
/// - `\r\n`. No flags.
static func decode(from buffer: inout ByteBuffer) throws -> Self {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Comment on lines +36 to +44
.split(separator: " ")
.map { flag in
guard let codeCharacter = flag.first,
let code = MemcacheFlag.Code(rawValue: codeCharacter)
else {
throw MemcachePartialDecodingError.fieldNotDecodable(as: MemcacheFlag.Code.self, from: String(flag))
}
return try .decode(from: flag.dropFirst(), for: code)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should look at that code complexity wise. It is iterating the flagsString multiple times from what I can see which we could avoid.

/// The message can have the following formats:
/// - `<size> <flags>\r\n<data block>\r\n`. Flags are space-separated strings.
/// - `<size>\r\n<data block>\r\n`
static func decode(from buffer: inout ByteBuffer) throws -> Self {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants