Skip to content

Post check individual XML files for know issues #241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 22, 2025
Merged

Conversation

alfsb
Copy link
Member

@alfsb alfsb commented Apr 11, 2025

This is a PR that increments warnings in almost all translations, so I ask for comments for a longer time, before commiting.

With this PR, when a configure.php fails on the first XML loading, it will run scripts/broken.php, to help debug issues that are present in one individual file.

This PR also will run scripts/broken.php as the last step of a successful configure.php run, to report "invisible" issues that may be accumulating, even if the language manual builds and validates.

The warnings are:

  • Error: Opening and ending tag mismatch:: Yes, there are XML files that are just plain broken, in some repositories.

  • Error: Start tag expected, '<' not found: These are mostly files named entities.foo.bar.xml. They are auto generated, and should not exist in translations. Remove these files.

  • XML file with BOM. Several tools may misbehave.: Files with BOM exercise different code paths inside libxml, to the point of generating unrelated warnings.

  • Error: Input is not proper UTF-8, indicate encoding !: Files not encoded as UTF-8.

  • Namespace prefix xlink for href on link is not defined: Mostly harmless, but fixing this issues helps running libxml with more strict options. Also individual XML files become more "correct" with proper name spaced prefixes.

  • Error: Extra content at the end of the document / Hint: Dir is marked with .xmlfragmentdir on doc-en? If not, check entity references. This warning has two false positives in the manual, that can be disabled by touching and committing an .xmlfragmentdir file on the directory. This directory will soon be moved/erased after "XML Entities" project advances, so it may be worth it to disable this warning in the meantime.

  • XML file contains \r. Several tools may misbehave.: The most common issue. Only shows when running on Linux and similar OSes. All languages in manual converged to a \n only EOL, at git repository level. Several userland tools only care for \n and will severely misbehave when dealing with text files that contain \r. Simply running dos2unix on these files fixes the issue.

I would like to hear comments for all points if possible, but more importantly on the last point, the \r one, as it is by far the most frequent one, as it will generate pages of warnings in some translations, if/when this PR is merged.

@philip
Copy link
Member

philip commented Apr 12, 2025

Thinking aloud, add the ability to run dos2unix on all files that emit the "XML file contains \r." error? Similar for the other simpler errors too, like removing the BOM.

@alfsb
Copy link
Member Author

alfsb commented Apr 12, 2025

Thinking aloud, add the ability to run dos2unix on all files that emit the "XML file contains \r." error? Similar for the other simpler errors too, like removing the BOM.

This would be possible, sure. But I'm wary of situation of one translator makes a one line change in one file, and upon running configure.php, it ends up with a hundred modified files, all lines changed.

@philip
Copy link
Member

philip commented Apr 12, 2025

This would be possible, sure. But I'm wary of situation of one translator makes a one line change in one file, and upon running configure.php, it ends up with a hundred modified files, all lines changed.

Agreed it wouldn't be automatic and would be part of a different operation. Whether that be php doc-base/scripts/fix-broken.php --issue newlines I'm not sure.

@alfsb
Copy link
Member Author

alfsb commented Apr 14, 2025

Pushed a new --dos2unix option, directly on broken.php, also with some output documentation improvements.

@philip
Copy link
Member

philip commented Apr 16, 2025

Nice @alfsb, LGTM! Tested php doc-base/configure.php --with-lang=it and php doc-base/scripts/broken.php --dos2unix ./it with success!

The minor issues (barely worth mentioning but the QA tester in me can't help it!) are (a) seeing no output (usage info) when passing in --dos2unix but forgetting the path and (b) it executing when dos2unix isn't installed. Again, minor, maybe worth addressing if these tools are ever refactored for fun.

@alfsb
Copy link
Member Author

alfsb commented Apr 16, 2025

The minor issues (barely worth mentioning but the QA tester in me can't help it!) are (a) seeing no output (usage info) when passing in --dos2unix but forgetting the path and (b) it executing when dos2unix isn't installed. Again, minor, maybe worth addressing if these tools are ever refactored for fun.

These are valid good points, and I found some of them while developing. But in the end I plan to remove the --dos2unix option, after the languages with big lists are fixed. I hacked --dos2unix into broken.php temporarily, to avoid duplicating most of the code of broken.php into a separate utility, that only will be used in the weeks following the merge of this PR.

@alfsb
Copy link
Member Author

alfsb commented Apr 17, 2025

Plan to merge this next week, and then waiting a week or so before opening issues on translations about new warnings that configure are generating (XInclude and broken).

mooreroger

This comment was marked as spam.

Copy link

@mooreroger mooreroger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@alfsb alfsb merged commit 342c72d into php:master Apr 22, 2025
12 checks passed
@alfsb
Copy link
Member Author

alfsb commented Apr 22, 2025

Merged. I will wait a few days/weeks before opening issues, instructing languages to refer this PR in how to address the new warnings.

@alfsb alfsb mentioned this pull request Apr 22, 2025
29 tasks
@alfsb alfsb deleted the docbroken2 branch April 23, 2025 03:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants