Skip UTF-8 BOM mark in `EncodingDetectingInputStream` and default to UTF-8 in `RewriteTest` #4546

knutwannheden · 2024-10-03T06:35:33Z

As the EncodingDetectingInputStream is only used as input for the parsers, we typically don't want to see any UTF-8 BOM marker. Additionally, platforms like .NET remove the BOM mark as well, so this change brings better compatibility.

The EncodingDetectingInputStream now also has less runtime overhead. Especially in cases when the charset was either already detected or specified by the caller.

Finally, RewriteTest will now default to parsing the source files using UTF-8, whereas it would before let the EncodingDetectingInputStream try to detect the encoding. When another encoding is required (or the test explicitly wants the encoding to be detected), the test can use RecipeSpec#executionContext(ExecutionContext) together with ParsingExecutionContextView#setCharset().

As the `EncodingDetectingInputStream` is only used as input for the parsers, we typically don't want to see any UTF-8 BOM marker. Additionally, platforms like .NET remove the BOM mark as well, so this change brings better compatibility.

github-actions

Some suggestions could not be made:

rewrite-java/src/test/java/org/openrewrite/java/JavadocPrinterTest.java
- lines 61-61

knutwannheden added 2 commits October 3, 2024 08:35

Add test for BOM skipping

13cae90

github-actions bot reviewed Oct 3, 2024

View reviewed changes

knutwannheden added 8 commits October 3, 2024 10:15

Improve performance of EncodingDetectingInputStream by a lot

7fad61b

Fix JavadocPrinterTest

bc4a799

Use UTF-8 as default encoding in RewriteTest

3be85b7

Fix bug when reading from single byte stream

f9ebfad

Add missing test

c0ccb10

Adjust CompilationUnitTest whitespace

84c589a

Fix handling of empty files

da76451

Polish one more test case

37aa0d4

knutwannheden changed the title ~~Skip UTF-8 BOM mark in EncodingDetectingInputStream~~ Skip UTF-8 BOM mark in EncodingDetectingInputStream and default to UTF-8 in RewriteTest Oct 3, 2024

knutwannheden merged commit c63ab56 into main Oct 3, 2024
2 checks passed

knutwannheden deleted the bom-mark branch October 3, 2024 09:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip UTF-8 BOM mark in `EncodingDetectingInputStream` and default to UTF-8 in `RewriteTest` #4546

Skip UTF-8 BOM mark in `EncodingDetectingInputStream` and default to UTF-8 in `RewriteTest` #4546

knutwannheden commented Oct 3, 2024 •

edited

Loading

github-actions bot left a comment

Skip UTF-8 BOM mark in EncodingDetectingInputStream and default to UTF-8 in RewriteTest #4546

Skip UTF-8 BOM mark in EncodingDetectingInputStream and default to UTF-8 in RewriteTest #4546

Conversation

knutwannheden commented Oct 3, 2024 • edited Loading

github-actions bot left a comment

Choose a reason for hiding this comment

Skip UTF-8 BOM mark in `EncodingDetectingInputStream` and default to UTF-8 in `RewriteTest` #4546

Skip UTF-8 BOM mark in `EncodingDetectingInputStream` and default to UTF-8 in `RewriteTest` #4546

knutwannheden commented Oct 3, 2024 •

edited

Loading