Skip to content

Improve int.toUnicode() documentation #80

Open
@Marcono1234

Description

@Marcono1234

The documentation for the newly added int.toUnicode() predicate says:

Returns the unicode character for the receiver seen as a unicode code point

This is slightly misleading because CodeQL strings consist of UTF-16 code points. Therefore supplementary code points (> U+FFFF) will result in two CodeQL string characters (demonstrated by this query). It might also be good to describe its behavior for invalid code point values. For surrogate code point it does not seem to have a result either, e.g. 55296.toUnicode().
Also it should uppercase "Unicode".

I would recommend the following description (or similar):

Returns the Unicode character for the receiver seen as a Unicode code point. Because CodeQL strings consist of UTF-16 code units, supplementary code points (that is > U+FFFF) result in a CodeQL string of length 2. This predicate has no result if the int receiver does not represent a valid Unicode code point, or represents the code point of a surrogate character.

This requires changes to the built-in documentation (which is why I created the issue here) as well as the language specification.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions