-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fs: cannot interact with invalid UTF-16 filenames on Windows, even with Buffers #23735
Comments
I think the issue here is that libuv attempts automatic UTF-8 → UTF-16 conversion for Windows file paths… /cc @nodejs/libuv |
Correct. From http://docs.libuv.org/en/v1.x/fs.html:
You feed it UTF-8 and libuv takes care of converting it to/from WCHAR. |
Perhaps Node.js could override WideCharToMultiByte and MultiByteToWideChar to make libuv use WTF-8 instead of UTF-8? |
@bnoordhuis @seishun @addaleax Should this be labeled |
@Trott This might be possible to fix without changes in libuv, but I would like some input on my idea before I proceed with investigation. |
It seems to me this might be already fixed in libuv as libuv/libuv#2970 landed. Is this correct @vtjnash ? If that's the case the next libuv release will have it and when it lands in nodejs this will be fixed. |
Yes. Might need testing, but that is the expectation as long as nodejs don't have a strictly-validating utf8 check in the way |
I believe this is fixed now. Closing but holler if it should be reopened. |
PR #5616 gave us support for Buffer paths in all fs methods, primarily to allow interacting with files of unknown or invalid file encoding. This helps on UNIX/Linux where filenames are technically just strings of bytes and do not necessarily represent a valid UTF-8 string.
Similarly, on Windows, filenames are just arrays of wchars, and do not necessarily represent a valid UTF-16 string, however the current
{ encoding: 'buffer' }
variety of fs methods do not properly handle this case. Instead, the Buffers that are returned are UTF-8 representations of (potentially losslessly / incorrectly) decoded UTF-16 filenames. Similarly, it's not possible to pass as input Buffers that represent the raw UTF-16 bytes. This leads to the possibility of files that Node can't interact with at all.Consider the following code that makes a file that doesn't have a proper UTF-16 name. The created file can be seen and interacted with using Windows Explorer and Notepad without issue.
Then, running the following Node code in the same directory shows that the file cannot be accessed:
The above code produces the following output when run in the same dir as the invalid UTF-16 file:
Refs:
#5616
rust-lang/rust#12056
jprichardson/node-fs-extra#612
The text was updated successfully, but these errors were encountered: