-
Notifications
You must be signed in to change notification settings - Fork 662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: Prevents MediawikiApi::ApiError Could not normalize image parameters filename.pdf. (urlparamnormal)
error
#6187
base: master
Are you sure you want to change the base?
Conversation
… and `.djvu` files and only query mediawiki accepted file formats - Adds a `save_placeholder_thumbnail` method that assigns placeholder values - Doing this solves the `MediawikiApi::ApiError Could not normalize image parameters filename.pdf. (urlparamnormal)`
# mediawiki cannot generate thumbnails for pdf,djvu files | ||
if title.match?(/\.(pdf|djvu)$/i) | ||
bad_file = CommonsUpload.find_by(file_name: title) | ||
save_placeholder_thumbnail(bad_file) if bad_file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think is should be done within the get_urls
method. The method name (get_something
) implies that it's a read-only method will find and return data without changing the database, so this will be an unexpected side effect.
The UploadImporter#import_urls method would be a more appropriate place to handle setting a placeholder thumbnail.
|
||
# mediawiki can generate thumbnails for jpg,png,tiff,wav files | ||
# mediawiki cannot generate thumbnails for pdf,djvu files | ||
if title.match?(/\.(pdf|djvu)$/i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there something in the response value that can be used, rather than simply matching for the file suffix? There may be cases where the error happens even with a different filetype, and the set of non-supported filetypes might change over time. It would be better to look for the error/warning within the response, instead of assuming based on file type. If we just wanted to do this based on file time, we could instead set placeholder at the time the file gets saved, before even making a query for the thumbnail. But I think relying on mediawiki to determine whether a thumbnail can be made is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file names are gotten with the build_info_query
method which just returns the page id and titles, this is an example response.
I've tried changing the parameters of the image_info_query
, if the iiurlheight
parameter is removed for example, the error goes but it returns only the url
of the files and not their thumburl
for both image and non-image files meaning this line in import_urls
:
file.thumburl = file_url['imageinfo'][0]['thumburl']
would not work as the imageinfo array would not have the thumburl
property.
It seems that to get the thumburl
in the response, iiurlheight
/ iiurlwidth
must be added as a parameter. But apparently the non-image files don't have these properties hence the error.
Another possible way could be to query for mediatype
as a iiprop
and if it is a OFFICE
mediatype which is what .pdf and .djvu fall under as seen here, add it to the non-image files. This is a list of mediawiki media types.
To my knowledge, the only way to match Could not normalise image parameters for....
is to allow the error happen and then handle/rescue it. My concern is that the error is raised and sent to sentry in wiki_api at this line @mediawiki.send(action, query)
, so if we want to match Could not normalise image parameters for..' directly, that means rescuing and handling the error in
wiki_api's
mediawikimethod, then in commons's
api_get` which is what you did here.
So please which way do you think is most suitable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I'd really like to avoid this kind of complex error handling, if possible. Do you have a query that triggers it? I'd like to look at the mediawiki response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this PR does
Refactors
get_urls
method to assign placeholder thumbnails to.pdf
and.djvu
files and only querymediawiki
accepted file formats and adds asave_placeholder_thumbnail
method that assigns placeholder valuesMediawikiApi::ApiError: Could not normalize image parameters filename.pdf. (urlparamnormal)
The logic used was derived from @ragesoss 's logic in this commit used to solve issue #330 : 'Upload importer attempts to get thumbnails over and over for invalid files'. It was later removed as per issue #699 : 'Commons spec is failing' in this commit as
mediawiki
handled it. It has however resurfaced again.Screenshots
Before:
I tested a failing set of pageids:

I then cloned the course that experienced the error: https://outreachdashboard.wmflabs.org/courses/Wikimedia_Indonesia/1Lib1Ref_di_Indonesia_Januari_2025/uploads and altered the code so the upload thumbnails would not be added and ran a manual update:
After refactoring:
Proof (logs / info) it works (I extracted snippets):
Test
Before:


After: