Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Avoid blindly re-encoding HTML files
Previously, HTML files werei stripped of their XML Processing Instruction headers and re-encoded from UTF-8 to HTML-ENTITIIES to be fed into the DomDocument. This caused problems for documents with CDATA blocks that contained Unicode, as it's not correct to escape that as HTML entities in the general case. For example CSS or binary data doesn't use that escaping system. Instead, load it directly and then remove the PI nodes after the fact. Bug: https://phabricator.wikimedia.org/T271390
- Loading branch information