Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Want to sanity-check the generated nut-website #52

Open
jimklimov opened this issue May 7, 2024 · 4 comments
Open

Want to sanity-check the generated nut-website #52

jimklimov opened this issue May 7, 2024 · 4 comments
Labels

Comments

@jimklimov
Copy link
Member

I have a FOSS project whose web site is generated by asciidoc and some custom scripts as an horde (thousands) of static files locally in the source files' repo, copied into another workspace and uploaded to github.io style repository, and eventually is rendered as an HTTP server for browsers around the world to see.

Users occasionally report that some of the links between site pages end up broken (lead nowhere).

The website build platform is generally POSIX-ish, although most often the agent doing the regular work is a Debian/Linux one. Maybe the platform differences cause the "page outages"; maybe this bug is platform-independent.

I had a thought about crafting a check for the two local directories as well as the resulting site to crawl all relative links (and/or absolute ones starting with its domain name(s)), and report any broken pages so I could focus on finding why they fail and/or avoiding publication of "bad" iterations - same as with compilers, debuggers and warnings elsewhere.

The general train of thought is about using some wget spider mode, though any other command-line tool (curl, lynx...), python script, shell with sed, etc. would do as well. Surely this particular wheel has been invented too many times for me to even think about making my own? A quick and cursory googling session while on commute did not come up with any good fit however.

So, suggestions are welcome :)

Posted as a question at https://unix.stackexchange.com/questions/775994/how-to-check-consistency-of-a-generated-web-site-using-recursive-html-parsing

@jimklimov
Copy link
Member Author

One promising suggestion was https://github.com/gjtorikian/html-proofer (packaged in Debian in ruby-html-proofer) - it does at least report a few hundred issues, inside the site and outside it (with third-party pages we refer to), so definitely something for me to chew on :)

@jimklimov jimklimov added the bug label May 7, 2024
jimklimov added a commit to networkupstools/networkupstools.github.io that referenced this issue May 7, 2024
jimklimov added a commit to networkupstools/networkupstools.github.io that referenced this issue May 7, 2024
jimklimov added a commit to jimklimov/nut-website that referenced this issue May 7, 2024
jimklimov added a commit to jimklimov/nut-website that referenced this issue May 7, 2024
jimklimov added a commit to jimklimov/nut-website that referenced this issue May 7, 2024
jimklimov added a commit to jimklimov/nut-website that referenced this issue May 7, 2024
jimklimov added a commit to jimklimov/nut-website that referenced this issue May 7, 2024
jimklimov added a commit to jimklimov/nut-website that referenced this issue May 7, 2024
jimklimov added a commit to jimklimov/nut-website that referenced this issue May 7, 2024
jimklimov added a commit to jimklimov/nut-website that referenced this issue May 7, 2024
The HTMLPROOFER found many references to this icon file, unfulfilled.

Signed-off-by: Jim Klimov <[email protected]>
jimklimov added a commit to jimklimov/nut-website that referenced this issue May 7, 2024
jimklimov added a commit to jimklimov/nut-website that referenced this issue May 7, 2024
The HTMLPROOFER found many references to this icon file, unfulfilled.

Signed-off-by: Jim Klimov <[email protected]>
jimklimov added a commit to jimklimov/nut-website that referenced this issue May 7, 2024
jimklimov added a commit to jimklimov/nut-website that referenced this issue May 7, 2024
jimklimov added a commit to jimklimov/nut-website that referenced this issue May 7, 2024
jimklimov added a commit to jimklimov/nut-website that referenced this issue May 7, 2024
jimklimov added a commit to jimklimov/nut-website that referenced this issue May 7, 2024
jimklimov added a commit to jimklimov/nut-website that referenced this issue May 7, 2024
jimklimov added a commit to jimklimov/nut-website that referenced this issue May 7, 2024
jimklimov added a commit to jimklimov/nut-website that referenced this issue May 7, 2024
jimklimov added a commit to jimklimov/nut-website that referenced this issue May 8, 2024
@jimklimov
Copy link
Member Author

jimklimov commented May 8, 2024

Ran with htmlproofer analysis of current generated site - took several hours to produce a number of complaints to handle subsequently:
https://ci.networkupstools.org/view/InfraTasks/job/nut-website/6193/pipeline-console/?selected-node=13

Many complaints about internal anchored links in output/protocols/apcsmart.html

Several other documents have their own misnamed(?) anchor links, as well as bad original asciidoc tags pointing to nearby documents (can be good for text browsing, but apparently not for resulting HTML).

Selected example complaints (overall there are about 700, although many patterns repeat in different locations):

- /home/jim/nut-website/networkupstools.github.io/docs/FAQ.html
  *  internally linking to UPGRADING, which does not exist (line 2)
     <a class="ulink" href="UPGRADING" target="_top">UPGRADING</a>
  *  internally linking to docbook-xsl.css, which does not exist (line 2)
     <link rel="stylesheet" type="text/css" href="docbook-xsl.css">

  *  internally linking to docs/config-notes.txt, which does not exist (line 86)
     <a class="ulink" href="docs/config-notes.txt" target="_top">docs/config-notes.txt</a>
  *  internally linking to https://www.networkupstools.org/cables/940-0024C.jpg, which does not exist (line 68)
     <a class="ulink" href="https://www.networkupstools.org/cables/940-0024C.jpg" target="_top">https://www.networkupstools.org/cables/940-0024C.jpg</a>
  *  internally linking to scheduling.txt, which does not exist (line 245)
     <a class="ulink" href="scheduling.txt" target="_top">scheduling.txt</a>
  *  internally linking to security.txt, which does not exist (line 236)
     <a class="ulink" href="security.txt" target="_top">security.txt</a>
  *  internally linking to security.txt, which does not exist (line 240)
     <a class="ulink" href="security.txt" target="_top">security.txt</a>
  *  internally linking to upssched.txt, which does not exist (line 257)
     <a class="ulink" href="upssched.txt" target="_top">upssched.txt</a>
- /home/jim/nut-website/networkupstools.github.io/docs/developer-guide.chunked/ar01s02.html
  *  internally linking to protocol.txt, which does not exist (line 19)
     <a class="ulink" href="protocol.txt" target="_top">protocol.txt</a>
  *  internally linking to sock-protocol.txt, which does not exist (line 15)
     <a class="ulink" href="sock-protocol.txt" target="_top">sock-protocol.txt</a>
- /home/jim/nut-website/networkupstools.github.io/docs/developer-guide.chunked/ar01s03.html
  *  internally linking to NEWS, which does not exist (line 690)
     <a class="ulink" href="NEWS" target="_top">NEWS</a>
  *  internally linking to UPGRADING, which does not exist (line 692)
     <a class="ulink" href="UPGRADING" target="_top">UPGRADING</a>
  *  internally linking to ci-farm-lxc-setup.txt, which does not exist (line 208)
     <a class="ulink" href="ci-farm-lxc-setup.txt" target="_top">ci-farm-lxc-setup.txt</a>
[2024-05-08T21:23:30.418Z] - output/nut-qa.html
[2024-05-08T21:23:30.418Z]   *  linking to internal hash #NUT_Security that does not exist (line 314)
[2024-05-08T21:23:30.418Z]      <a href="user-manual.html#NUT_Security">security features</a>

Note links to txt not html:

[2024-05-08T21:23:30.415Z] - output/docs/user-manual.chunked/_setting_up_the_multi_arch_linux_lxc_container_farm_for_nut_ci.html
[2024-05-08T21:23:30.415Z]   *  internally linking to config-prereqs.txt, which does not exist (line 338)
[2024-05-08T21:23:30.415Z]      <a class="ulink" href="config-prereqs.txt" target="_top">config-prereqs.txt</a>

[2024-05-08T21:23:30.417Z] - output/documentation.html
[2024-05-08T21:23:30.417Z]   *  linking to internal hash #Developer_man that does not exist (line 143)
[2024-05-08T21:23:30.417Z]      <a href="docs/man/index.html#Developer_man">Developer manual pages</a>

Hordes of apcsmart protocol links in particular:

[2024-05-08T21:23:30.419Z] - output/protocols/apcsmart.html
[2024-05-08T21:23:30.419Z]   *  linking to internal hash #@ that does not exist (line 729)
[2024-05-08T21:23:30.419Z]      <a href="#@"><strong></strong></a>
...
[2024-05-08T21:23:30.420Z]   *  linking to internal hash #B that does not exist (line 508)
[2024-05-08T21:23:30.420Z]      <a href="#B">actual voltage</a>
...
[2024-05-08T21:23:30.421Z]      <a href="#D">calibrated</a>
[2024-05-08T21:23:30.421Z]   *  linking to internal hash #D that does not exist (line 558)
...

@jimklimov
Copy link
Member Author

jimklimov commented May 9, 2024

At least some of the "internal hash" issues can be false-positives of the tool, see gjtorikian/html-proofer#819

Also of note: 3.14.x and 3.19.x versions on the Debian 12 and Ubuntu 22 workers tried so far are quite behind the current development (5.0.9 at the moment) which "saved" us from some other false positives but generally constrains available features.

Not sure if newer versions have anything about parallel processing performance, but with 3.1x.y ones here I can't get it to happen. FWIW, question posted at gjtorikian/html-proofer#840

@jimklimov
Copy link
Member Author

jimklimov commented May 9, 2024

Custom-building the tool seems possible, but may require a newer ruby (>= 3.1 < 4.0) to run.

Ruby custom install per:

:; git clone -o upstream https://github.com/gjtorikian/html-proofer
:; cd html-proofer
:; gem build html-proofer.gemspec
:; gem install html-proofer-5.0.9.gem

### :; bundle install

This makes the built proffer (and its dependencies) available in user's local shim env:

:; which htmlproofer
/home/jim/.asdf/shims/htmlproofer

:; htmlproofer  --version
5.0.9

jimklimov added a commit to networkupstools/nut that referenced this issue May 10, 2024
…le is newer than .git/HEAD [networkupstools/nut-website#52]

    if test -e .git/HEAD && ( rm -f "`find "$@" -not -newer .git/HEAD`" || true ) 2>/dev/null && ls -la .git/HEAD "$@" 2>/dev/null ; then SKIP ; else WORK ; fi

Hopefully this takes care of corner cases:

* No .git/HEAD => WORK (may be unsuccessfully, maybe not - e.g. Git submodules referring to parent)
* `rm` fails, maybe `find` returns empty => DON'T CARE, go to LS
* `ls` fails (one of target files is absent - e.g. ChangeLog removed or never was there) => WORK

Only if the ChangeLog is still there after the attempt on its life, SKIP and keep it

Signed-off-by: Jim Klimov <[email protected]>
jimklimov added a commit to networkupstools/nut that referenced this issue May 10, 2024
…le is newer than anything in a NUT_GITDIR (may be not "./git/" directly) [networkupstools/nut-website#52]

Signed-off-by: Jim Klimov <[email protected]>
jimklimov added a commit to networkupstools/nut that referenced this issue May 10, 2024
jimklimov added a commit to networkupstools/nut that referenced this issue May 10, 2024
…w that the parent Makefile knows when to slack off [networkupstools/nut-website#52]

Signed-off-by: Jim Klimov <[email protected]>
jimklimov added a commit to networkupstools/nut that referenced this issue May 16, 2024
…ectory" with some "test" command implementations [networkupstools/nut-website#52]

Some `test`'s evaluate all conditions first and only
handle them via boolean logic later. Oh the audacity!

Signed-off-by: Jim Klimov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant