Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a boolean property to TextDocument if the encoding was guessed #209503

Open
Colengms opened this issue Apr 4, 2024 · 10 comments
Open

Add a boolean property to TextDocument if the encoding was guessed #209503

Colengms opened this issue Apr 4, 2024 · 10 comments
Labels
api feature-request Request for new features or functionality file-encoding File encoding type issues under-discussion Issue is under discussion for relevance, priority, approach workbench-editors Managing of editor widgets in workbench window
Milestone

Comments

@Colengms
Copy link
Contributor

Colengms commented Apr 4, 2024

This is a request for API access to the original encoding of a file. i.e.:

image

(If this is already possible, please let me know. I couldn't find it.)

If I understand correctly, VS Code represents the content of open files to extensions as UTF-8. In the C/C++ Extension, there are scenarios in which IntelliSense should reflect how the compiler would interpret the file, which may depend on the files actual encoding. i.e. MSVC and string literal contents.

@bpasero bpasero changed the title Feature Request: API access to open file's original encoding API access to open file's original encoding Apr 7, 2024
@bpasero bpasero added feature-request Request for new features or functionality api workbench-editors Managing of editor widgets in workbench window file-encoding File encoding type issues labels Apr 7, 2024
@bpasero bpasero removed their assignment Apr 7, 2024
@bpasero bpasero added this to the Backlog milestone Apr 7, 2024
@Colengms
Copy link
Contributor Author

Colengms commented Dec 12, 2024

In addition to the original encoding, it would also be very helpful to have access to (a bool indicating) whether or not the file was re-encoded from the encoding specified by the files.encoding setting (meaning that the file encoding was not able to be auto-detected). The C/C++ Extension indexes files in the workspace, honoring the files.encoding setting when reading directly from disk (if unable to detect the encoding). When that setting changes we should reparse files, preferably only the ones impacted by that change. We can determine ourselves for files read directly from disk. However, if open in the editor, we use the file buffer provided by VS Code. A change to files.encoding could indicate that the buffer we had used (from VS Code) may not have been translated correctly and should be reparsed.

@bpasero
Copy link
Member

bpasero commented Dec 18, 2024

Related: #824

@bpasero
Copy link
Member

bpasero commented Feb 4, 2025

I would like to go ahead and merge into #824 as per the original request. #209503 (comment) seems unrelated/additive, so I would suggest to report as individual new issue.

@bpasero bpasero added info-needed Issue requires more information from poster under-discussion Issue is under discussion for relevance, priority, approach labels Feb 4, 2025
Copy link

This issue has been closed automatically because it needs more information and has not had recent activity. See also our issue reporting guidelines.

Happy Coding!

@vs-code-engineering vs-code-engineering bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 12, 2025
@bpasero
Copy link
Member

bpasero commented Feb 13, 2025

@Colengms 👋 friendly ping for request for input: #824 (comment)

@Colengms
Copy link
Contributor Author

Hi @bpasero . Re: #824 (comment) , would it be possible to throw in just an additional boolean to indicate whether the encoding provided was detected as a result of a guess, due to autoGuessEncoding being set to true? The C/C++ Extension loads files directly from disk (such as headers that are not open in the editor but #include'd by an open file) from a native process. If that header is then opened in the editor, we switch to using that buffer, and can encounter some offset inconsistencies internally, if our interpretation of the file doesn't match VS Code's guess. That bool could help us detect and correct that.

@bpasero
Copy link
Member

bpasero commented Feb 14, 2025

I do not think we would add that to the API given the very little use of that autoGuessEncoding encoding setting through our user base. Why would you need this specifically?

@Colengms
Copy link
Contributor Author

Colengms commented Feb 14, 2025

Why would you need this specifically?

Hi @bpasero . If, in additional to the original encoding, we could get this boolean, and it were false, it would indicate that no further work is needed. It's only when files.autoGuessEncoding is used that we have further work to do. We're already tracking changes to files.encoding and interpreting non-guessed encoding the same as VS Code. It's relevant, as it disambiguates whether or not the encoding reported is the actual encoding of the file (based on BOM, common UTF-16 detection, or explicit user preference) vs. a guess. It would seem pretty simple to include.

Note that there are some scenarios our users are running into that may currently only be addressable using files.autoGuessEncoding, as files.encoding is too heavy a hammer, impacting the user's files, their system includes, and all of their library headers. microsoft/vscode-cpptools#11407 (comment)

@bpasero bpasero reopened this Feb 15, 2025
@bpasero bpasero removed the info-needed Issue requires more information from poster label Feb 15, 2025
@bpasero bpasero changed the title API access to open file's original encoding Add a boolean property to TextDocument if the encoding was guessed Feb 15, 2025
@bpasero
Copy link
Member

bpasero commented Feb 15, 2025

What specifically is the difference between a text document where the encoding was guessed vs. when the encoding was configured by the user or set explicitly after opening? Especially since guessing the encoding very often just yields the wrong encoding?

@bpasero
Copy link
Member

bpasero commented Feb 18, 2025

@Colengms btw latest VS Code insiders ships with the encoding property, maybe you can already use it?

declare module 'vscode' {
// https://github.com/microsoft/vscode/issues/824
export interface TextDocument {
/**
* The file encoding of this document that will be used when the document is saved.
*
* Use the {@link workspace.onDidChangeTextDocument onDidChangeTextDocument}-event to
* get notified when the document encoding changes.
*
* Note that the possible encoding values are currently defined as any of the following:
* 'utf8', 'utf8bom', 'utf16le', 'utf16be', 'windows1252', 'iso88591', 'iso88593',
* 'iso885915', 'macroman', 'cp437', 'windows1256', 'iso88596', 'windows1257',
* 'iso88594', 'iso885914', 'windows1250', 'iso88592', 'cp852', 'windows1251',
* 'cp866', 'cp1125', 'iso88595', 'koi8r', 'koi8u', 'iso885913', 'windows1253',
* 'iso88597', 'windows1255', 'iso88598', 'iso885910', 'iso885916', 'windows1254',
* 'iso88599', 'windows1258', 'gbk', 'gb18030', 'cp950', 'big5hkscs', 'shiftjis',
* 'eucjp', 'euckr', 'windows874', 'iso885911', 'koi8ru', 'koi8t', 'gb2312',
* 'cp865', 'cp850'.
*/
readonly encoding: string;
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api feature-request Request for new features or functionality file-encoding File encoding type issues under-discussion Issue is under discussion for relevance, priority, approach workbench-editors Managing of editor widgets in workbench window
Projects
None yet
Development

No branches or pull requests

2 participants