Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Details about wide-character support #26

Open
ghost opened this issue Mar 2, 2015 · 3 comments
Open

Details about wide-character support #26

ghost opened this issue Mar 2, 2015 · 3 comments

Comments

@ghost
Copy link

ghost commented Mar 2, 2015

Could you please detail what parts of multi-byte characters are supported by CompCert? Or, inversely, if there are specific multi-byte-related features which are not supported?

In particular, some compilers accept this code (e.g. GCC, Clang), while CompCert emits a warning (escape sequence is out of range (code 0x3B1)) and then crashes:

#include <stdio.h>
int main() {
  printf("\u03B1-conversion\n");
}

Also, whenever a wide character literal is used in a function such as wprintf, a warning similar to argument #1 of function call has type int * instead of the expected type wchar_t const * is emitted. This seems to impact the result of this function, which differs from GCC/Clang in examples such as:

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main() {
  setlocale(LC_ALL, "en_US.UTF-8");
  wchar_t c = L'\x3b1';
  wprintf(L"α %lc\n", c);
}

CompCert prints some nonsensical values such as α ? instead of α α.

@xavierleroy
Copy link
Contributor

I'm afraid CompCert's support for multi-byte characters in string and char literals is nonexistent at the moment, while support for wide strings and wide chars is flaky... None of that stuff was exercised by our current users.

There are two issues that can easily be fixed: the crash and the wrong initialization of c by L'\x3b1'.

I don't know yet how to go about UTF8 chars in string and wide string literals, though.

xavierleroy added a commit that referenced this issue Mar 7, 2015
- Error instead of warning if escape sequence overflows one character.
- Wrong normalization of L'x' to char instead of wchar_t.
- More careful overflow tests.
@xavierleroy
Copy link
Contributor

Fixed the two easy issues in commit [master d3959ce].

Chapter 5 of the CompCert manual does document that multi-byte characters are not supported.

I leave this support as a feature wish.

@ghost
Copy link
Author

ghost commented Mar 9, 2015

Excellent, thank you very much.

I do not have much use for wide strings, so it's probably not worth spending more time on it. The fixed crash is already more than enough for my needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant