Description
Hi,
as part of a production pen test for an application using Aedes as MQTT broker, a question came up how the broker handles invalid UTF-8 code points in topic strings.
According to the MQTT spec (cmp. http://docs.oasis-open.org/mqtt/mqtt/v3.1.1/errata01/os/mqtt-v3.1.1-errata01-os-complete.html#_Toc442180829) and as highlighted by a whitepaper from Trendmicro (https://documents.trendmicro.com/assets/white_papers/wp-the-fragility-of-industrial-IoTs-data-backbone.pdf, section 2.1.2) some code points (i. e. control characters) MUST close the network connection, for some others it MAY close the network connection (e. g. U+0001..U+001F, U+007F..U+009F). However, I could not find any of the filtering anywhere in the code of neither Aedes nor mqtt-packet, which I was supposing to be the relevant candidate for doing so (as it provides the parser for the topic).
The conformance statements from the spec:
The character data in a UTF-8 encoded string MUST be well-formed UTF-8 as defined by the Unicode specification [Unicode] and restated in RFC 3629 [RFC3629]. In particular this data MUST NOT include encodings of code points between U+D800 and U+DFFF. If a Server or Client receives a Control Packet containing ill-formed UTF-8 it MUST close the Network Connection [MQTT-1.5.3-1].
A UTF-8 encoded string MUST NOT include an encoding of the null character U+0000. If a receiver (Server or Client) receives a Control Packet containing U+0000 it MUST close the Network Connection [MQTT-1.5.3-2].
The data SHOULD NOT include encodings of the Unicode [Unicode] code points listed below. If a receiver (Server or Client) receives a Control Packet containing any of them it MAY close the Network Connection:
U+0001..U+001F control characters
U+007F..U+009F control characters
Code points defined in the Unicode specification [Unicode] to be non-characters (for example U+0FFFF)
A UTF-8 encoded sequence 0xEF 0xBB 0xBF is always to be interpreted to mean U+FEFF ("ZERO WIDTH NO-BREAK SPACE") wherever it appears in a string and MUST NOT be skipped over or stripped off by a packet receiver [MQTT-1.5.3-3].
Is this part of the spec just not implemented anywhere or am I looking at the wrong code base?