Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix broken link bug for article's name that includes slash #822

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

Mightymanh
Copy link

@Mightymanh Mightymanh commented Mar 28, 2025

Issue: Broken Date Links in Articles having name that contains slash

Fixes #817

Changes

In frontend, double encode the article name.
In backend, decode the encoded article name (Flask decode the first time, and we decode it the second time to get original name).

@audiodude
Copy link
Member

This doesn't work. For this URL:

http://localhost:5000/v1/articles/WP%253AVersion%25201.0%2520Editorial%2520Team%252FAesthetics%2520articles%2520by%2520quality%2520log/2010-06-21T18%3A12%3A55Z/redirect

I get:

image

Traceback (most recent call last)
File "/Users/tmoney/.local/share/virtualenvs/wp1-gH9A0XCZ/lib/python3.12/site-packages/flask/app.py", line 2213, in __call__
return self.wsgi_app(environ, start_response)
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tmoney/.local/share/virtualenvs/wp1-gH9A0XCZ/lib/python3.12/site-packages/flask/app.py", line 2193, in wsgi_app
response = self.handle_exception(e)
           ^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tmoney/.local/share/virtualenvs/wp1-gH9A0XCZ/lib/python3.12/site-packages/flask_cors/extension.py", line 194, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
                                            ^^^^^^^^^^^^^^^^^^
File "/Users/tmoney/.local/share/virtualenvs/wp1-gH9A0XCZ/lib/python3.12/site-packages/flask/app.py", line 2190, in wsgi_app
response = self.full_dispatch_request()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tmoney/.local/share/virtualenvs/wp1-gH9A0XCZ/lib/python3.12/site-packages/flask/app.py", line 1486, in full_dispatch_request
rv = self.handle_user_exception(e)
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tmoney/.local/share/virtualenvs/wp1-gH9A0XCZ/lib/python3.12/site-packages/flask_cors/extension.py", line 194, in wrapped_function
return cors_after_request(app.make_response(f(*args, **kwargs)))
                                            ^^^^^^^^^^^^^^^^^^
File "/Users/tmoney/.local/share/virtualenvs/wp1-gH9A0XCZ/lib/python3.12/site-packages/flask/app.py", line 1484, in full_dispatch_request
rv = self.dispatch_request()
     ^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tmoney/.local/share/virtualenvs/wp1-gH9A0XCZ/lib/python3.12/site-packages/flask/app.py", line 1469, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tmoney/code/wp1/wp1/web/articles.py", line 14, in redirect
revid = get_revision_id_by_timestamp(page, timestamp)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tmoney/code/wp1/wp1/api.py", line 90, in get_revision_id_by_timestamp
rev = next(page.revisions(start=timestamp, limit=1, prop='ids'))
           ^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'revisions'

Which seems to indicate that the value being passed from the API is None, which supports my assumption in #817 (comment) that the API is unable to locate the page.

However, the fact that it got "further" down the stack seems to support your idea that it was a routing issue.

By the way, you don't need a toolforge account to test this PR. If you run the dev Docker compose you will get a development database that contains data, and you can test a similar article to the bug here: http://localhost:5173/#/project/Aesthetics/articles?quality=Project-Class&importance=NA-Class

@Mightymanh
Copy link
Author

With the original code, I tested article with non-slashed link and it still shows that NoneType error. Thats why I think I need API key or maybe I miss setting up something for development.

In the wp1/credentials.py, I only change the host and port to 127.0.0.1 as below so the backend can communicate with the database enwp10_dev and the redis
image

Try to show all databases in the docker container wp1bot-db-dev and there is enwp10_dev
image

Can you try running the original code, navigate to http://localhost:5173/#/project/Alexandra%20Stan/articles?quality=Assessed-Class, and click any link that has no slash? I clicked the first link and it shows NoneType error.

image

@audiodude
Copy link
Member

Yes you're right, I get the NoneType error on all timestamp links, regardless if they have a slash or not.

You still don't need toolforge credentials though. It turns out this comment is wrong: https://github.com/openzim/wp1/blob/main/wp1/credentials.py.example#L63

The API credentials are needed for this redirect logic to work.

All you need is a Wikipedia user name and password though:

credentials.py

        'API': {
          'user': 'MyFunkyWikipediaUserName',
          'pass': 'my_pass,
        },

@audiodude
Copy link
Member

Anyways, please refer to #817 (comment). I would prefer a solution that uses non "pretty" URL parameters, rather than double encoding.

@Mightymanh
Copy link
Author

Anyways, please refer to #817 (comment). I would prefer a solution that uses non "pretty" URL parameters, rather than double encoding.

i changed double-encoding url to url with query parameters. Tested with a lot of broken links and non-broken links and they work. There is this link that is not found: http://localhost:5000/v1/articles/redirect?name=Mami%20(Alexandra%20Stan%20song)&timestamp=2018-02-23T21%3A51%3A09Z, but I tested it: it enters the article router and exit with 404 status.

Copy link
Member

@audiodude audiodude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@audiodude
Copy link
Member

There is this link that is not found: http://localhost:5000/v1/articles/redirect?name=Mami%20(Alexandra%20Stan%20song)&timestamp=2018-02-23T21%3A51%3A09Z, but I tested it: it enters the article router and exit with 404 status.

Yes, the article has been redirected since the last scrape. This is expected. (https://en.wikipedia.org/wiki/Mami_(song))

@Mightymanh
Copy link
Author

image

I forget that test case for article will not work on the new change because the routing changes.

@Mightymanh
Copy link
Author

@audiodude Hello this pull request still fails due to test case not being re-implemented. Do you want me to make a minor change in the test case for ArticleTest? I see that we just need to edit the link in the function client.get()

@Mightymanh
Copy link
Author

The API credentials are needed for this redirect logic to work.

All you need is a Wikipedia user name and password though:

credentials.py

        'API': {
          'user': 'MyFunkyWikipediaUserName',
          'pass': 'my_pass,
        },

I think you need to include this in the README file. Set up code is not too bad but the problem is credentials.py and sometimes I dont know if I miss any credentials that are essential for development, unless after I mention you about this.

@audiodude
Copy link
Member

@audiodude Hello this pull request still fails due to test case not being re-implemented. Do you want me to make a minor change in the test case for ArticleTest? I see that we just need to edit the link in the function client.get()

Yes, fixing/writing tests is part of a valid PR.

I think you need to include this in the README file. Set up code is not too bad but the problem is credentials.py and sometimes I dont know if I miss any credentials that are essential for development, unless after I mention you about this.

Absolutely right. As I mentioned, the comment is wrong and needs to be updated.

@Mightymanh
Copy link
Author

Test cases are fixed and properly tested in local environment.

Copy link

codecov bot commented Apr 1, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.58%. Comparing base (605d590) to head (fc75f33).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #822   +/-   ##
=======================================
  Coverage   91.57%   91.58%           
=======================================
  Files          66       66           
  Lines        3597     3600    +3     
=======================================
+ Hits         3294     3297    +3     
  Misses        303      303           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

In frontend, double encode the article name.
In backend, decode the encoded article name (Flask decode the first time, and we decode it the second time to get original name).
change route from <name>/<timestamp>/redirect -> /redirect?name=&timestamp=
@audiodude audiodude force-pushed the bug/broken-date-link-in-article branch from 6e4e883 to fc75f33 Compare April 1, 2025 06:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

❌ Broken Date Links in Aerospace Biography Articles Page and few more
2 participants