Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip UTF-8 BOM mark in EncodingDetectingInputStream and default to UTF-8 in RewriteTest #4546

Merged
merged 10 commits into from
Oct 3, 2024

Conversation

knutwannheden
Copy link
Contributor

@knutwannheden knutwannheden commented Oct 3, 2024

As the EncodingDetectingInputStream is only used as input for the parsers, we typically don't want to see any UTF-8 BOM marker. Additionally, platforms like .NET remove the BOM mark as well, so this change brings better compatibility.

The EncodingDetectingInputStream now also has less runtime overhead. Especially in cases when the charset was either already detected or specified by the caller.

Finally, RewriteTest will now default to parsing the source files using UTF-8, whereas it would before let the EncodingDetectingInputStream try to detect the encoding. When another encoding is required (or the test explicitly wants the encoding to be detected), the test can use RecipeSpec#executionContext(ExecutionContext) together with ParsingExecutionContextView#setCharset().

As the `EncodingDetectingInputStream` is only used as input for the parsers, we typically don't want to see any UTF-8 BOM marker. Additionally, platforms like .NET remove the BOM mark as well, so this change brings better compatibility.
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions could not be made:

  • rewrite-java/src/test/java/org/openrewrite/java/JavadocPrinterTest.java
    • lines 61-61

@knutwannheden knutwannheden changed the title Skip UTF-8 BOM mark in EncodingDetectingInputStream Skip UTF-8 BOM mark in EncodingDetectingInputStream and default to UTF-8 in RewriteTest Oct 3, 2024
@knutwannheden knutwannheden merged commit c63ab56 into main Oct 3, 2024
2 checks passed
@knutwannheden knutwannheden deleted the bom-mark branch October 3, 2024 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

1 participant