Description
The documentation for the newly added int.toUnicode()
predicate says:
Returns the unicode character for the receiver seen as a unicode code point
This is slightly misleading because CodeQL strings consist of UTF-16 code points. Therefore supplementary code points (> U+FFFF) will result in two CodeQL string characters (demonstrated by this query). It might also be good to describe its behavior for invalid code point values. For surrogate code point it does not seem to have a result either, e.g. 55296.toUnicode()
.
Also it should uppercase "Unicode".
I would recommend the following description (or similar):
Returns the Unicode character for the receiver seen as a Unicode code point. Because CodeQL strings consist of UTF-16 code units, supplementary code points (that is > U+FFFF) result in a CodeQL string of length 2. This predicate has no result if the int receiver does not represent a valid Unicode code point, or represents the code point of a surrogate character.
This requires changes to the built-in documentation (which is why I created the issue here) as well as the language specification.