Description
- Version: 10.12.0
- Platform: Windows 10 64-bit
- Subsystem: fs
PR #5616 gave us support for Buffer paths in all fs methods, primarily to allow interacting with files of unknown or invalid file encoding. This helps on UNIX/Linux where filenames are technically just strings of bytes and do not necessarily represent a valid UTF-8 string.
Similarly, on Windows, filenames are just arrays of wchars, and do not necessarily represent a valid UTF-16 string, however the current { encoding: 'buffer' }
variety of fs methods do not properly handle this case. Instead, the Buffers that are returned are UTF-8 representations of (potentially losslessly / incorrectly) decoded UTF-16 filenames. Similarly, it's not possible to pass as input Buffers that represent the raw UTF-16 bytes. This leads to the possibility of files that Node can't interact with at all.
Consider the following code that makes a file that doesn't have a proper UTF-16 name. The created file can be seen and interacted with using Windows Explorer and Notepad without issue.
#include "stdafx.h"
#include <iostream>
#include <windows.h>
#include <string>
using namespace std;
int main()
{
// Junk surrogate pair
const wchar_t *filename = L"hi\xD801\x0037";
HANDLE hfile = CreateFileW(filename, GENERIC_READ, 0, NULL, CREATE_NEW, FILE_ATTRIBUTE_NORMAL, NULL);
return 0;
}
Then, running the following Node code in the same directory shows that the file cannot be accessed:
const fs = require('fs');
const bufs = fs.readdirSync('.\\', { encoding: 'buffer' });
for (const buf of bufs) {
try {
const stat = fs.statSync(buf);
console.log('successfully got stats of: ' + buf.toString('utf8'));
} catch (err) {
console.log('error getting stats of: ' + buf.toString('utf8'));
}
}
The above code produces the following output when run in the same dir as the invalid UTF-16 file:
error getting stats of: hi�7
successfully got stats of: test.js
Refs:
#5616
rust-lang/rust#12056
jprichardson/node-fs-extra#612