Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

❌ Broken Date Links in Aerospace Biography Articles Page and few more #817

Closed
RatingRishu opened this issue Mar 25, 2025 · 13 comments Β· Fixed by #822
Closed

❌ Broken Date Links in Aerospace Biography Articles Page and few more #817

RatingRishu opened this issue Mar 25, 2025 · 13 comments Β· Fixed by #822

Comments

@RatingRishu
Copy link

πŸ“Œ Issue :

Some date links in the Aerospace Biography Articles page are not working. Clicking on them results in a "Not Found" error.

πŸ”„ Steps to Reproduce:
Go to [Page URL] (if applicable).

Click on any date link (e.g., 2009-04-23).

The link leads to a "Not Found" error instead of the expected content.

βœ… Expected Behavior:

Clicking on a date link should navigate to the correct page.

❌ Actual Behavior:

Some date links lead to a "Not Found" page.

Image

Image

πŸ’‘ Suggested Fix:

Check if the linked URLs are correct.

Ensure the target pages exist.

Fix any incorrect routing or broken paths.

Would appreciate a fix for this issue. Let me know if further details are needed. Thanks!

@audiodude
Copy link
Member

Thanks for the report!

We use an endpoint in WP1 to perform the redirect. The broken URL is:

https://api.wp1.openzim.org/v1/articles/WP%3AWikiProject%20Aviation%2FAerospace%20biography%20task%20force/2009-04-23T15%3A36%3A30Z/redirect

It's possible that the slash in the article name is causing the link to not be properly routed to the redirect endpoint, though that seems difficult to believe since it is escaped (%2F).

@IWang20
Copy link

IWang20 commented Mar 25, 2025

Hi, first time here and a bit new to open source as well, can I be assigned to this?

@audiodude
Copy link
Member

Hi @IWang20. We don't assign issues, but feel free to submit a PR!

@IWang20
Copy link

IWang20 commented Mar 25, 2025

Gotcha, also is there any way to communicate with other devs other than the issues page on Github?

@audiodude
Copy link
Member

Yes, see https://wiki.kiwix.org/wiki/Communication and in particular the Kiwix Slack

@benoit74
Copy link

What is possible is that the page is really missing from the ZIM due to the slash in article name. Slash in article name are a nightmare to handle because they have two semantics in Mediawiki : Aerospace/Page1 could be both a page named Aerospace/Page1 or a subpage Page1 of Aerospace page. And I know we still have some bugs around it in mwoffliner

@audiodude
Copy link
Member

What is possible is that the page is really missing from the ZIM due to the slash in article name. Slash in article name are a nightmare to handle because they have two semantics in Mediawiki : Aerospace/Page1 could be both a page named Aerospace/Page1 or a subpage Page1 of Aerospace page. And I know we still have some bugs around it in mwoffliner

This issue has nothing to do with ZIMs. It's not part of any ZIM creation process, but rather the WP1 Bot part of the WP1 web app.

This is the page in question: https://wp1.openzim.org/#/project/Aerospace%20biography/articles?quality=Project-Class

For an example of pages that work, see: https://wp1.openzim.org/#/project/Aerospace%20biography/articles?quality=GA-Class

On the latter, clicking a timestamp takes you to the page revision at that time, presumably so that you can verify that the version of the page "really is GA quality".

@elfkuzco
Copy link
Contributor

I have tried this on a couple of articles with the Project Quailty and they all seem to be broken. Think it might have to do more with the code to get the revision id from wp1/web/articles.py rather than the links.

@audiodude
Copy link
Member

Yes, it is possible that this the slash needs to be escaped when retrieving the page from the API.

@Mightymanh
Copy link
Contributor

The main problem is that the broken link never enters the route "//redirect" in wp1/web/articles.py. The reason is that Flask decodes the url before matching url to router, refer to https://stackoverflow.com/questions/24519076/python-flask-url-encoded-leading-slashes-causing-404-or-405.

My solution is double encode the name section of the url in frontend. And decode the name section at the backend. I can't test if it redirects correctly because I don't have Toolforge API key but the fix in pull request I linked above should allow the broken url to be processed by the articles router and get the original name and timestampt.

@audiodude
Copy link
Member

I don't think that SO answer is the exact problem we're seeing here. In their case, the URL param has a "leading" slash (ie /foo/bar/baz.txt), but our URLs don't have that (https://api.wp1.openzim.org/v1/articles/WP%3AWikiProject%20Aviation%2FAerospace%20biography%20task%20force/2009-04-23T15%3A36%3A30Z/redirect). It is still possible that the URL routing is not working, but it's also possible that the API lookup of the article is failing because at that point the article name is NOT URL encoded, and we are hitting this line: https://github.com/openzim/wp1/blob/main/wp1/web/articles.py#L18

@Mightymanh
Copy link
Contributor

I don't think that SO answer is the exact problem we're seeing here. In their case, the URL param has a "leading" slash (ie /foo/bar/baz.txt), but our URLs don't have that (https://api.wp1.openzim.org/v1/articles/WP%3AWikiProject%20Aviation%2FAerospace%20biography%20task%20force/2009-04-23T15%3A36%3A30Z/redirect). It is still possible that the URL routing is not working, but it's also possible that the API lookup of the article is failing because at that point the article name is NOT URL encoded, and we are hitting this line: https://github.com/openzim/wp1/blob/main/wp1/web/articles.py#L18

The nature of Flask is that it decodes the URL before it matches the routing. When we send https://api.wp1.openzim.org/v1/articles/WP%3AWikiProject%20Aviation%2FAerospace%20biography%20task%20force/2009-04-23T15%3A36%3A30Z/redirect to backend, Flask decodes it as https://api.wp1.openzim.org/v1/articles/WP:WikiProject Aviation/Aerospace biography task force/2009-04-23T15:36:30Z/redirect. then Flask used the decoded url to match routing. The decoded url never matches article route <name>/<timestamp>/redirect because the decoded url has 4 sections but the article route needs 3 sections.

@audiodude
Copy link
Member

audiodude commented Mar 28, 2025

That makes sense, and I think you're right.

However, it's probably more straightforward to forgo "pretty URLs" than to double encode the parameters. So we could re-write the route to:

/v1/articles/redirect

and pass the url as:

/v1/articles/redirect?article=WP%3AWikiProject%20Aviation%2FAerospace%20biography%20task%20force&timestamp=2009-04-23T15%3A36%3A30Z

The URL is only really seen by the API backend so it shouldn't affect the user experience.

All of this just fixes the routing though, and you still have the issue with the API lookup as described in #822 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment